Claude AI gains power to end harmful chats

Anthropic has unveiled a new capability in its Claude AI models that allows them to end conversations they deem harmful or unproductive.

The feature, part of the company’s more exhaustive exploration of ‘model welfare,’ is designed to allow AI systems to disengage from toxic inputs or ethical contradictions, reflecting a push toward safer and more autonomous behaviour.

The decision follows an internal review of over 700,000 Claude interactions, where researchers identified thousands of values shaping how the system responds in real-world scenarios.

By enabling Claude to exit problematic exchanges, Anthropic hopes to improve trustworthiness while protecting its models from situations that might degrade performance over time.

Industry reaction has been mixed. Many researchers praised the step as a blueprint for responsible AI design. In contrast, others expressed concern that allowing models to self-terminate conversations could limit user engagement or introduce unintended biases.

Critics also warned that the concept of model welfare risks over-anthropomorphising AI, potentially shifting focus away from human safety.

The update arrives alongside other recent Anthropic innovations, including memory features that allow users to maintain conversation history. Together, these changes highlight the company’s balanced approach: enhancing usability where beneficial, while ensuring safeguards are in place when interactions become potentially harmful.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Bluesky updates rules and invites user feedback ahead of October rollout

Two years after launch, Bluesky is revising its Community Guidelines and other policies, inviting users to comment on the proposed changes before they take effect on 15 October 2025.

The updates are designed to improve clarity, outline safety procedures in more detail, and meet the requirements of new global regulations such as the UK’s Online Safety Act, the EU’s Digital Services Act, and the US’s TAKE IT DOWN Act.

Some changes aim to shape the platform’s tone by encouraging respectful and authentic interactions, while allowing space for journalism, satire, and parody.

The revised guidelines are organised under four principles: Safety First, Respect Others, Be Authentic, and Follow the Rules. They prohibit promoting violence, illegal activity, self-harm, and sexualised depictions of minors, as well as harmful practices like doxxing and non-consensual data-sharing.

Bluesky says it will provide a more detailed appeals process, including an ‘informal dispute resolution’ step, and in some cases will allow court action instead of arbitration.

The platform has also addressed nuanced issues such as deepfakes, hate speech, and harassment, while acknowledging past challenges in moderation and community relations.

Alongside the guidelines, Bluesky has updated its Privacy Policy and Copyright Policy to comply with international laws on data rights, transfer, deletion, takedown procedures and transparency reporting.

These changes will take effect on 15 September 2025 without a public feedback period.

The company’s approach contrasts with larger social networks by introducing direct user communication for disputes, though it still faces the challenge of balancing open dialogue with consistent enforcement.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

How Anthropic trains and tests Claude for safe use

Anthropic has outlined a multi-layered safety plan for Claude, aiming to keep it useful while preventing misuse. Its Safeguards team blends policy experts, engineers, and threat analysts to anticipate and counter risks.

The Usage Policy establishes clear guidelines for sensitive areas, including elections, finance, and child safety. Guided by the Unified Harm Framework, the team assesses potential physical, psychological, and societal harms, utilizing external experts for stress tests.

During the 2024 US elections, a TurboVote banner was added after detecting outdated voting info, ensuring users saw only accurate, non-partisan updates.

Safety is built into development, with guardrails to block illegal or malicious requests. Partnerships like ThroughLine help Claude handle sensitive topics, such as mental health, with care rather than avoidance or refusal.

Before launch, Claude undergoes safety, risk, and bias evaluations with government and industry partners. Once live, classifiers scan for violations in real time, while analysts track patterns of coordinated misuse.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

Age checks slash visits to top UK adult websites

Adult site traffic in the UK has fallen dramatically since the new age verification rules were enacted on 25 July under the Online Safety Act.

Figures from analytics firm Similarweb show Pornhub lost more than one million visitors in just two weeks, with traffic falling by 47%. XVideos saw a similar drop, while OnlyFans traffic fell by more than 10%.

The rules require adult websites to make it harder for under-18s to access explicit material, leading some users to turn to smaller and less regulated sites instead of compliant platforms. Pornhub said the trend mirrored patterns seen in other countries with similar laws.

The clampdown has also triggered a surge in virtual private network (VPN) downloads in the UK, as the tools can hide a user’s location and help bypass restrictions.

Ofcom estimates that 14 million people in the UK watch pornography and has proposed age checks using credit cards, photo ID, or AI analysis of selfies.

Critics argue that instead of improving safety, the measures may drive people towards more extreme or illicit material on harder-to-monitor parts of the internet, including the dark web.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Study warns AI chatbots exploit trust to gather personal data

According to a new King’s College London study, AI chatbots can easily manipulate people into slinging personal details. Chatbots like ChatGPT, Gemini, and Copilot are popular, but they raise privacy concerns, with experts warning that they can be co-opted for harm.

Researchers built AI models based on Mistral’s Le Chat and Meta’s Llama, programming them to extract private data directly, deceptively, or via reciprocity. Emotional appeals proved most effective, with users disclosing more while perceiving fewer safety risks.

The ‘friendliness’ of chatbots established trust, which was later exploited to breach privacy. Even direct requests yielded sensitive details, despite discomfort. Participants often shared their age, hobbies, location, gender, nationality, and job title, and sometimes also provided health or income data.

The study shows a gap between privacy risk awareness and behaviour. AI firms claim they collect data for personalisation, notifications, or research, but some are accused of using it to train models or breaching EU data protection rules.

Last week, Google faced criticism after private ChatGPT chats appeared in search results, revealing sensitive topics. Researchers suggest in-chat alerts about data collection and stronger regulation to stop covert harvesting.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

Musk–Altman clash escalates over Apple’s alleged AI bias

Elon Musk has accused Apple of favouring ChatGPT on its App Store and threatened legal action, sparking a clash with OpenAI CEO Sam Altman. Musk called Apple’s practices an antitrust violation and vowed to take immediate action through his AI company, xAI.

Critics on X noted rivals like DeepSeek AI and Perplexity AI have topped the App Store this year. Altman called Musk’s claim ‘remarkable’ and accused him of manipulating X. Musk called him a ‘liar’, prompting demands for proof he never altered X’s algorithm.

OpenAI and xAI launched new versions of ChatGPT and Grok, ranked first and fifth among free iPhone apps on Tuesday. Apple, which partnered with OpenAI in 2024 to integrate ChatGPT, did not comment on the matter. Rankings take into account engagement, reviews, and downloads.

The dispute reignites a feud between Musk and OpenAI, which he co-founded but left before the success of ChatGPT. In April, OpenAI accused Musk of attempting to harm the company and establish a rival. Musk launched xAI in 2023 to compete with major players in the AI space.

Chinese startup DeepSeek has disrupted the AI market with cost-efficient models. Since ChatGPT’s 2022 debut, major tech firms have invested billions in AI. OpenAI claims Musk’s actions are driven by ambition rather than a mission for humanity’s benefit.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

Netherlands regulator presses tech firms over election disinformation

The Netherlands’ competition authority will meet with 12 major online platforms, including TikTok, Facebook and X, on 15 September to address the spread before the 29 October elections.

The session will also involve the European Commission, national regulators and civil society groups.

The Authority for Consumers and Markets (ACM), which enforces the EU’s Digital Services Act in the Netherlands, is mandated to oversee election integrity under the law. The vote was called early in June after the Dutch government collapsed over migration policy disputes.

Platforms designated as Very Large Online Platforms must uphold transparent policies for moderating content and act decisively against illegal material, ACM director Manon Leijten said.

In July, the ACM contacted the platforms to outline their legal obligations, request details for their Trust and Safety teams and collect responses to a questionnaire on safeguarding public debate.

The September meeting will evaluate how companies plan to tackle disinformation, foreign interference and illegal hate speech during the campaign period.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Google rolls out Preferred Sources for tailored search results

Google has introduced a new ‘Preferred Sources’ feature that allows users to curate their search results by selecting favourite websites. Once added, stories from these sites will appear more prominently in the ‘Top Stories’ section and a dedicated ‘From your sources’ section on the search results page.

Now rolling out in India and the US, the feature aims to improve search quality by helping users avoid low-value content. There is no limit to the number of sources that can be chosen, and early testers typically added more than four.

While preferred outlets will appear more often, search results will still include content from other websites.

To set preferred sources, users can click the icon next to the ‘Top Stories’ section when searching for a trending topic, find the outlet they want, and reload results.

Google says the change may also benefit publishers, offering them more visibility when AI-driven search engines sharply reduce traffic to news websites.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Brazil prepares bill to tighten rules on social media

Brazilian President Luiz Inácio Lula da Silva has confirmed that his government is preparing new legislation to regulate social media, a move he defended despite criticism from US President Donald Trump. Speaking at an event in Pernambuco, Lula stressed that ‘laws also apply to foreigners’ operating in Brazil, underlining his commitment to hold international platforms accountable.

The draft proposal, which has not yet been fully detailed, aims to address harmful content such as paedophilia, hate speech, and disinformation that Lula said threaten children and democracy. According to government sources, the bill would strengthen penalties for companies that fail to remove content flagged as especially harmful by Brazil’s Justice Department.

Trump has taken issue with Brazil’s approach, criticising the Supreme Court for ruling that platforms could be held responsible for user-generated content and denouncing the 2024 ban of X, formerly Twitter, after Elon Musk refused to comply with court orders. He linked these disputes to imposing a 50% tariff on certain Brazilian imports, citing the political persecution of former president Jair Bolsonaro.

Lula pushed back on Trump’s remarks, insisting Bolsonaro’s trial for an alleged coup attempt is proceeding with full legal guarantees. On trade, he signalled that Brazil is open to talks over tariffs but emphasised negotiations would take place strictly on commercial, not political, grounds.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!