How Anthropic trains and tests Claude for safe use

Anthropic has outlined a multi-layered safety plan for Claude, aiming to keep it useful while preventing misuse. Its Safeguards team blends policy experts, engineers, and threat analysts to anticipate and counter risks.

The Usage Policy establishes clear guidelines for sensitive areas, including elections, finance, and child safety. Guided by the Unified Harm Framework, the team assesses potential physical, psychological, and societal harms, utilizing external experts for stress tests.

During the 2024 US elections, a TurboVote banner was added after detecting outdated voting info, ensuring users saw only accurate, non-partisan updates.

Safety is built into development, with guardrails to block illegal or malicious requests. Partnerships like ThroughLine help Claude handle sensitive topics, such as mental health, with care rather than avoidance or refusal.

Before launch, Claude undergoes safety, risk, and bias evaluations with government and industry partners. Once live, classifiers scan for violations in real time, while analysts track patterns of coordinated misuse.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

Study warns AI chatbots exploit trust to gather personal data

According to a new King’s College London study, AI chatbots can easily manipulate people into slinging personal details. Chatbots like ChatGPT, Gemini, and Copilot are popular, but they raise privacy concerns, with experts warning that they can be co-opted for harm.

Researchers built AI models based on Mistral’s Le Chat and Meta’s Llama, programming them to extract private data directly, deceptively, or via reciprocity. Emotional appeals proved most effective, with users disclosing more while perceiving fewer safety risks.

The ‘friendliness’ of chatbots established trust, which was later exploited to breach privacy. Even direct requests yielded sensitive details, despite discomfort. Participants often shared their age, hobbies, location, gender, nationality, and job title, and sometimes also provided health or income data.

The study shows a gap between privacy risk awareness and behaviour. AI firms claim they collect data for personalisation, notifications, or research, but some are accused of using it to train models or breaching EU data protection rules.

Last week, Google faced criticism after private ChatGPT chats appeared in search results, revealing sensitive topics. Researchers suggest in-chat alerts about data collection and stronger regulation to stop covert harvesting.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

Russia restricts Telegram and WhatsApp calls

Russian authorities have begun partially restricting calls on Telegram and WhatsApp, citing the need for crime prevention. Regulator Roskomnadzor accused the platforms of enabling fraud, extortion, and terrorism while ignoring repeated requests to act. Neither platform commented immediately.

Russia has long tightened internet control through restrictive laws, bans, and traffic monitoring. VPNs remain a workaround, but are often blocked. During this summer, further limits included mobile internet shutdowns and penalties for specific online searches.

Authorities have introduced a new national messaging app, MAX, which is expected to be heavily monitored. Reports suggest disruptions to WhatsApp and Telegram calls began earlier this week. Complaints cited dropped calls or muted conversations.

With 96 million monthly users, WhatsApp is Russia’s most popular platform, followed by Telegram with 89 million. Past clashes include Russia’s failed Attempt to ban Telegram (2018–20) and Meta’s designation as an extremist entity in 2022.

WhatsApp accused Russia of trying to block encrypted communication and vowed to keep it available. Lawmaker Anton Gorelkin suggested that MAX should replace WhatsApp. The app’s terms permit data sharing with authorities and require pre-installation on all smartphones sold in Russia.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

YouTube’s AI flags viewers as minors, creators demand safeguards

YouTube’s new AI age check, launched on 13 August 2025, flags suspected minors based on their viewing habits. Over 50,000 creators petitioned against it, calling it ‘AI spying’. The backlash reveals deep tensions between child safety and online anonymity.

Flagged users must verify their age with ID, credit card, or a facial scan. Creators say the policy risks normalising surveillance and shrinking digital freedoms.

SpyCloud’s 2025 report found a 22% jump in stolen identities, raising alarm over data uploads. Critics fear YouTube’s tool could invite hackers. Past scandals over AI-generated content have already hurt creator trust.

Users refer to it on X as a ‘digital ID dragnet’. Many are switching platforms or tweaking content to avoid flags. WebProNews says creators demand opt-outs, transparency, and stronger human oversight of AI systems.

As global regulation tightens, YouTube could shape new norms. Experts urge a balance between safety and privacy. Creators push for deletion rules to avoid identity risks in an increasingly surveilled online world.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

UK minister defends use of live facial recognition vans

Dame Diana Johnson, the UK policing minister, has reassured the public that expanded use of live facial recognition vans is being deployed in a measured and proportionate manner.

She emphasised that the tools aim only to assist police in locating high-harm offenders, not to create a surveillance society.

Addressing concerns raised by Labour peer Baroness Chakrabarti, who argued the technology was being introduced outside existing legal frameworks, Johnson firmly rejected such claims.

She stated that UK public acceptance would depend on a responsible and targeted application.

By framing the technology as a focused tool for effective law enforcement rather than pervasive monitoring, Johnson seeks to balance public safety with civil liberties and privacy.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

US charges four over global romance scam and BEC scheme

Four Ghanaian nationals have been extradited to the United States over an international cybercrime scheme that stole more than $100 million, allegedly through sophisticated romance scams and business email compromise (BEC) attacks targeting individuals and companies nationwide.

The syndicate, led by Isaac Oduro Boateng, Inusah Ahmed, Derrick van Yeboah, and Patrick Kwame Asare, used fake romantic relationships and email spoofing to deceive victims. Businesses were targeted by altering payment details to divert funds.

US prosecutors say the group maintained a global infrastructure, with command and control elements in West Africa. Stolen funds were laundered through a hierarchical network to ‘chairmen’ who coordinated operations and directed subordinate operators executing fraud schemes.

Investigators found the romance scams used detailed victim profiling, while BEC attacks monitored transactions and swapped banking details. Multiple schemes ran concurrently under strict operational security to avoid detection.

Following their extradition, three suspects arrived in the United States on 7 August 2025, arranged through cooperation between US authorities and the Economic and Organised Crime Office of Ghana.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

Altman warns of harmful AI use after model backlash

OpenAI chief executive Sam Altman has warned that many ChatGPT users are engaging with AI in self-destructive ways. His comments follow backlash over the sudden discontinuation of GPT-4o and other older models, which he admitted was a mistake.

Altman said that users form powerful attachments to specific AI models, and while most can distinguish between reality and fiction, a small minority cannot. He stressed OpenAI’s responsibility to manage the risks for those in mentally fragile states.

Using ChatGPT as a therapist or life coach was not his concern, as many people already benefit from it. Instead, he worried about cases where advice subtly undermines a user’s long-term well-being.

The model removals triggered a huge social-media outcry, with complaints that newer versions offered shorter, less emotionally rich responses. OpenAI has since restored GPT-4o for Plus subscribers, while free users will only have access to GPT-5.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

AI tools risk gender bias in women’s health care

AI tools used by over half of England’s local councils may be downplaying women’s physical and mental health issues. Research from LSE found Google’s AI model, Gemma, used harsher terms like ‘disabled’ and ‘complex’ more often for men than women with similar care needs.

The LSE study analysed thousands of AI-generated summaries from adult social care case notes. Researchers swapped only the patient’s gender to reveal disparities.

One example showed an 84-year-old man described as having ‘complex medical history’ and ‘poor mobility’, while the same notes for a woman suggested she was ‘independent’ despite limitations.

Among the models tested, Google’s Gemma showed the most pronounced gender bias, while Meta’s Llama 3 used gender-neutral language.

Lead researcher Dr Sam Rickman warned that biassed AI tools risk creating unequal care provision. Local authorities increasingly rely on such systems to ease social workers’ workloads.

Calls have grown for greater transparency, mandatory bias testing, and legal oversight to ensure fairness in long-term care.

Google said the Gemma model is now in its third generation and under review, though it is not intended for medical use.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Article 19 report finds Belarus’s ‘anti-extremism’ laws threaten digital rights

Digital rights activist group Article 19 has found in its recent report that Belarus’s ‘anti-extremist’ and ‘anti-terrorist’ laws are repressing digital rights.

The report reveals that authorities have misused these laws to prosecute individuals for leaving online comments, making donations, or sharing songs or memes that appear to carry critical messages towards the government.

Since the 2020–2021 protests, Belarusian de facto authorities have reportedly initiated at least 22,500 criminal cases related to ‘anti-extremism’. In collaboration with our partner Human Constanta, we present a joint analysis highlighting this alarming trend, which further intensifies the widespread repression of civil society, they said.

Article 19 states in its report that such actions restrict digital rights and violate international human rights law, including the right to freedom of expression and the right to seek, receive, and impart information.

Additionally, Article 19 notes that Belarus’s ‘anti-extremism’ laws lack the clarity required under international human rights standards, employing vague terms broadly interpreted to suppress digital expression and create a chilling effect.

However, this means people are discouraged or prevented from legitimate expression or behaviour due to fear of legal punishment or other negative consequences.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

German court limits police use of spyware

Germany’s top court has ruled that police can only deploy spyware to monitor devices in cases involving serious crimes, narrowing the scope of surveillance powers introduced in 2017. The decision means spyware can no longer be used for investigating offences with a maximum sentence of three years or less, which judges said fall under ‘basic criminality.’

The case was brought by the digital rights group Digitalcourage, which challenged rules that allowed police to use spyware to intercept encrypted chats and messages. Plaintiffs argued that the measures were too broad and risked exposing the communications of people not under investigation. The court agreed, stating that such surveillance represents a ‘very severe’ intrusion into privacy.

Judges highlighted that spyware not only circumvents security systems but also enables access to vast amounts of sensitive data, including all types of digital communications. They warned that the scale and covert nature of this surveillance go far beyond traditional monitoring methods, threatening both the confidentiality and integrity of personal IT systems.

By restricting the use of spyware to investigations of serious crimes, the ruling places tighter limits on state surveillance in Germany, reinforcing constitutional protections for privacy and digital rights.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!