22 Feb 2024

OpenAI’s GPT-3.5 outputs contain 60% of plagiarism, Copyleaks report shows

Copyleaks reports 60% plagiarism in GPT-3.5 outputs, sparking concerns for creators. AI-based analysis reveals 45.7% identical text, 27.4% minor changes, and 46.5% paraphrased content.

A report by Copyleaks reveals that 60% of outputs from OpenAI’s GPT-3.5 show some form of plagiarism, raising concerns for content creators. The issue centers on generative AI trained on copyrighted material, potentially producing exact copies, leading to legal disputes.

Copyleaks, an AI-based text analysis company, uses a proprietary scoring method, considering identical text, minor changes, and paraphrased content. GPT-3.5 results indicate 45.7% identical text, 27.4% minor changes, and 46.5% paraphrased content. Additionally, it was found that 0% of the content is entirely original, while a score of 100% implies that none of the content is original.

OpenAI defends its models, which are designed to learn concepts with safeguards against inadvertent memorization and intentional content regurgitation, as stated by spokesperson Lindsey Held.

The New York Times lawsuit alleges copyright infringement due to AI systems’ “widescale copying,” with OpenAI countering that “regurgitation” is a rare bug and accusing The New York Times of manipulating prompts, adding complexity to the ongoing debate surrounding AI-generated content and intellectual property concerns.

OpenAI’s GPT-3.5 outputs contain 60% of plagiarism, Copyleaks report shows

Related topics

Related technologies

Related videos

Related news