20 Dec 2023

Study exposes AI image generators trained with CSAM

Stanford Internet Observatory study found that LAION AI database, which is used to train leading AI image-makers, was found to have over 3,200 images of suspected CSAM.

A recent study by the Stanford Internet Observatory found over 3,200 images of suspected child sexual abuse material (CSAM) within the Large-scale Artificial Intelligence Open Network (LAION) open dataset, a massive AI dataset extensively used to train leading image-generating AI models, such as Stable Diffusion. The study found that AI models trained on the LAION-5B dataset create photorealistic AI-generated nude images, including CSAM.

Immediate actions were taken in response to the study findings, and the identified source was removed. LAION emphasised a zero-tolerance policy for illegal content and that the datasets would be reinstated only after ensuring their safety.

The Stanford internet Observatory advocates for drastic measures, including deleting datasets based on LAION-5B or working with intermediaries to clean material. It also suggested implementing CSAM detection technologies in content hosting platforms like Microsoft’s PhotoDNA.

Why does it matter?

The incident sheds light on broader implications for the AI sector, as various text-to-image generators are linked to the LAION database. As reported by the Guardian, while these identified images constitute a fraction of LAION’s extensive index of approximately 5.8 billion images, the Stanford group argues that their presence likely influences the capability of AI tools to produce harmful outputs. Moreover, there are concerns that the repeated appearance of real victims in the dataset reinforces prior abuse. Additionally, there are worries about the transformation of innocuous social media photos of clothed teenagers into explicit content.