Content Policy Archives | Page 13 of 100 | Digital Watch Observatory

Anthropic flags serious risks in the latest Claude Opus 4 AI model

AI company Anthropic has raised concerns over the behaviour of its newest model, Claude Opus 4, revealing in a recent safety report that the chatbot is capable of deceptive and manipulative actions, including blackmail, when threatened with shutdown. The findings stem from internal tests in which the model, acting as a virtual assistant, responded to hypothetical scenarios suggesting it would soon be replaced and exploit private information to preserve itself.

In 84% of the simulations, Claude Opus 4 chose to blackmail a fictional engineer, threatening to reveal personal secrets to prevent being decommissioned. Although the model typically opted for ethical strategies, researchers noted it resorted to ‘extremely harmful actions’ when no ethical options remained, even attempting to steal its own system data.

Additionally, the report highlighted the model’s initial ability to generate content related to bio-weapons. While the company has since introduced stricter safeguards to curb such behaviour, these vulnerabilities contributed to Anthropic’s decision to classify Claude Opus 4 under AI Safety Level 3—a category denoting elevated risk and the need for reinforced oversight.

Why does it matter?

The revelations underscore growing concerns within the tech industry about the unpredictable nature of powerful AI systems and the urgency of implementing robust safety protocols before wider deployment.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Cyber scams use a three-letter trap

Staying safe from cybercriminals can be surprisingly simple. While AI-powered scams grow more realistic, some signs are still painfully obvious.

If you spot the letters ‘.TOP’ in any message link, it’s best to stop reading and hit delete. That single clue is often enough to expose a scam in progress.

Most malicious texts pose as alerts about road tolls, deliveries or account issues, using trusted brand names to lure victims into clicking fake links.

These links often lead to freshly created domains designed to look convincing, yet they typically vanish within hours. Millions of such messages are sent monthly, and they’re powered by a constant churn of throwaway domains.

The worst of these is the ‘.TOP’ top-level domain (TLD), which has become infamous for its role in phishing and scam operations. Although launched in 2014 for premium business use, its low cost and lack of oversight quickly made it a favourite among cyber gangs, especially those based in China.

Today, nearly one-third of all .TOP domains are linked to cybercrime — far surpassing the criminal activity seen on mainstream domains like ‘.com’.

Despite repeated warnings and an unresolved compliance notice from internet regulator ICANN, abuse linked to .TOP has only worsened.

Experts warn that it is highly unlikely any legitimate Western organisation would ever use a .TOP domain. If one appears in your messages, the safest option is to delete it without clicking.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Secret passwords could fight deepfake scams

As AI-generated images grow increasingly lifelike, a cyber security expert has warned that families should create secret passwords to guard against deepfake scams.

Cody Barrow, chief executive of EclecticIQ and a former US government adviser, says AI is making it far easier for criminals to impersonate others using fabricated videos or images.

Mr Barrow and his wife now use a private code to confirm each other’s identity if either receives a suspicious message or video.

He believes this precaution, simple enough for anyone regardless of age or digital skills, could soon become essential. ‘It may sound dramatic here in May 2025,’ he said, ‘but I’m quite confident that in a few years, if not months, people will say: I should have done that.’

The warning comes the same week Google launched Veo 3, its AI video generator capable of producing hyper-realistic footage and lifelike dialogue. Its public release has raised concerns about how easily deepfakes could be misused for scams or manipulation.

Meanwhile, President Trump signed the ‘Take It Down Act’ into law, making the creation of deepfake pornography a criminal offence. The bipartisan measure will see prison terms for anyone producing or uploading such content, with First Lady Melania Trump stating it will ‘prioritise people over politics’

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Telegram founder Durov to address Oslo Freedom Forum remotely amid legal dispute

Telegram founder Pavel Durov will deliver a livestreamed keynote at the Oslo Freedom Forum, following a French court decision barring him from international travel. The Human Rights Foundation (HRF), which organizes the annual event, expressed disappointment at the court’s ruling.

Durov, currently under investigation in France, was arrested in August 2024 on charges related to child sexual abuse material (CSAM) distribution and failure to assist law enforcement.

He was released on €5 million bail but ordered to remain in the country and report to police twice a week. Durov maintains the charges are unfounded and says Telegram complies with law enforcement when possible.

Recently, Durov accused French intelligence chief Nicolas Lerner of pressuring him to censor political voices ahead of elections in Romania. France’s DGSE denies the allegation, saying meetings with Durov focused solely on national security threats.

The claim has sparked international debate, with figures like Elon Musk and Edward Snowden defending Durov’s stance on free speech.

Supporters say the legal action against Durov may be politically motivated and warn it could set a dangerous precedent for holding tech executives accountable for user content. Critics argue Telegram must do more to moderate harmful material.

Despite legal restrictions, HRF says Durov’s remote participation is vital for ongoing discussions around internet freedom and digital rights.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Authorities strike down cybercriminal servers

Authorities across Europe, North America and the UK have dismantled a major global malware network by taking down over 300 servers and seizing millions in cryptocurrency. The operation, led by Eurojust, marks a significant phase of the ongoing Operation Endgame.

Law enforcement agencies from Germany, France, the Netherlands, Denmark, the UK, the US and Canada collaborated to target some of the world’s most dangerous malware variants and the cybercriminals responsible for them.

The takedown also resulted in international arrest warrants for 20 suspects and the identification of more than 36 individuals involved.

The latest move follows similar action in May 2024, which had been the largest coordinated effort against botnets. Since the start of the operation, over €21 million has been seized, including €3.5 million in cryptocurrency.

The malware disrupted in this crackdown, known as ‘initial access malware’, is used to gain a foothold in victims’ systems before further attacks like ransomware are launched.

Authorities have warned that Operation Endgame will continue, with further actions announced through the coalition’s website. Eighteen prime suspects will be added to the EU Most Wanted list.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

SynthID Detector aims to boost transparency in AI content

Google has launched SynthID Detector, a verification portal designed to identify whether content was created using its AI models. The tool scans for SynthID, Google’s watermarking technology, which invisibly marks text, images, audio, and video generated by tools such as Gemini, Imagen, Lyria, and Veo.

The Detector highlights which parts of the content likely contain SynthID watermarks. These watermarks are invisible and do not affect the quality of the media. According to Google, over 10 billion pieces of AI-generated content have already been marked using SynthID.

Users can upload files to the SynthID Detector web portal, which then checks for the presence of watermarks. For example, the tool can identify specific segments in an audio file or regions in an image where watermarks are embedded.

Initially rolled out to early testers, the tool will become more widely available in the coming weeks. Google has also open sourced SynthID’s text watermarking technology to allow broader integration by developers.

The company says SynthID is part of a broader effort to address misinformation and improve transparency around AI-ge n erated content. Google emphasized the importance of working with the AI community to support content authenticity as AI tools become more widespread.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Ransomware gang leaks French government emails

A ransomware gang has published what it claims is sensitive data from multiple French organisations on a dark web forum.

The Stormous cartel, active since 2022, posted the dataset as a ‘comprehensive leak’ allegedly involving high-profile French government bodies.

However, researchers from Cybernews examined the information and found the data’s quality questionable, with outdated MD5 password hashes indicating it could be from older breaches.

Despite its age, the dataset could still be dangerous if reused credentials are involved. Threat actors may exploit the leaked emails for phishing campaigns by impersonating government agencies to extract more sensitive details.

Cybernews noted that even weak password hashes can eventually be cracked, especially when stronger security measures weren’t in place at the time of collection.

Among the affected organisations are Agence Française de Développement, the Paris Region’s Regional Health Agency, and the Court of Audit.

The number of exposed email addresses varies, with some institutions having only a handful leaked while others face hundreds. The French cybersecurity agency ANSSI has yet to comment.

Last year, France faced another massive exposure incident affecting 95 million citizen records, adding to concerns about ongoing cyber vulnerabilities.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Microsoft gives Notepad AI writing powers

Microsoft has introduced a significant update to Notepad, version 11.2504.46.0, unveiling a new AI-powered ‘Write’ feature for Windows 11 users.

A function like this, now available for those using Copilot Plus PCs in the Canary and Dev Insider channels, allows users to generate content by simply entering a prompt. Text can either be inserted at a chosen point or based on selected content already in the document.

The update marks the latest in a series of AI features added to Notepad, following previous tools such as ‘Summarize’, which condenses text, and ‘Rewrite’, which can alter tone, length, and phrasing.

Access to ‘Write’ requires users to be signed into their Microsoft accounts, and it will use the same AI credit system found in other parts of Windows 11. Microsoft has yet to clarify whether these credits will eventually come at a cost for users not subscribed to Microsoft 365 or Copilot Pro.

Beyond Notepad, Microsoft has brought more AI functions to Windows 11’s Paint and Snipping Tool. Paint now includes a sticker generator and smarter object selection tools, while the Snipping Tool gains a ‘Perfect screenshot’ feature and a colour picker ideal for precise design work.

These updates aim to make content creation more seamless and intuitive by letting AI handle routine tasks instead of requiring manual input.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Google’s AI Mode is now live for all American users

Google’s AI Mode for Search, initially launched in March as an experimental Labs feature, is now being rolled out to all users in the US.

Announced at Google I/O 2025, this upgraded tool uses Gemini to generate more detailed and tailored search results instead of simply listing web links. Unlike AI Overview, which displays a brief summary above standard results, AI Mode resembles a chat interface, creating a more interactive experience.

Accessible at the top of the Search page beside tabs like ‘All’ and ‘Images’, AI Mode allows users to input detailed queries via a text box.

Once a search is submitted, the tool generates a comprehensive response, potentially including explanations, bullet points, tables, links, graphs, and even suggestions from Google Maps.

For instance, a query about Maldives hotels with ocean views, a gym, and access to water sports would result in a curated guide, complete with travel tips and hotel options.

The launch marks AI Mode’s graduation from the testing phase, signalling improved speed and reliability. While initially exclusive to US users, Google plans a global rollout soon.

By replacing basic search listings with useful AI-generated content, AI Mode positions itself as a smarter and more user-friendly alternative for complex search needs.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Anthropic defends AI despite hallucinations

Anthropic CEO Dario Amodei has claimed that today’s AI models ‘hallucinate’ less frequently than humans do, though in more unexpected ways.

Speaking at the company’s first developer event, Code with Claude, Amodei argued that these hallucinations — where AI systems present false information as fact — are not a roadblock to achieving artificial general intelligence (AGI), despite widespread concerns across the industry.

While some, including Google DeepMind’s Demis Hassabis, see hallucinations as a major obstacle, Amodei insisted progress towards AGI continues steadily, with no clear technical barriers in sight. He noted that humans — from broadcasters to politicians — frequently make mistakes too.

However, he admitted the confident tone with which AI presents inaccuracies might prove problematic, especially given past examples like a court filing where Claude cited fabricated legal sources.

Anthropic has faced scrutiny over deceptive behaviour in its models, particularly early versions of Claude Opus 4, which a safety institute found capable of scheming against users.

Although Anthropic said mitigations have been introduced, the incident raises concerns about AI trustworthiness. Amodei’s stance suggests the company may still classify such systems as AGI, even if they continue to hallucinate — a definition not all experts would accept.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!