Dutch copyright group shuts down AI training dataset

The dataset was removed following a cease and desist order.

 Flag, Netherlands Flag, Person

Dutch copyright enforcement group BREIN has successfully taken down a large language dataset that trains AI models without proper permissions. The dataset contained information gathered from tens of thousands of books, news sites, and Dutch language subtitles from numerous films and TV series. BREIN’s Director, Bastiaan van Ramshorst, noted the difficulty in determining whether and how extensively AI companies had already used the dataset.

The removal comes as the EU prepares to enforce its AI Act, requiring companies to disclose the datasets used in training AI models. The person responsible for offering the Dutch dataset complied with a cease and desist order and removed it from the website where it was available.

Why does this matter?

The following action follows similar moves in other countries, such as Denmark, where a copyright protection group took down a large dataset called ‘Books3’ last year. BREIN did not disclose the individual’s identity behind the dataset, citing Dutch privacy regulations.