OpenAI deploys new safeguards for AI models to curb biothreat risks

OpenAI has introduced a new monitoring system to reduce the risk of its latest AI models, o3 and o4-mini, being misused to create chemical or biological threats.

The ‘safety-focused reasoning monitor’ is built to detect prompts related to dangerous materials and instruct the AI models to withhold potentially harmful advice, instead of providing answers that could aid bad actors.

These newer models represent a major leap in capability compared to previous versions, especially in their ability to respond to prompts about biological weapons. To counteract this, OpenAI’s internal red teams spent 1,000 hours identifying unsafe interactions.

Simulated tests showed the safety monitor successfully blocked 98.7% of risky prompts, although OpenAI admits the system does not account for users trying again with different wording, a gap still covered by human oversight instead of relying solely on automation.

Despite assurances that neither o3 nor o4-mini meets OpenAI’s ‘high risk’ threshold, the company acknowledges these models are more effective at answering dangerous questions than earlier ones like o1 and GPT-4.

Similar monitoring tools are also being used to block harmful image generation in other models, yet critics argue OpenAI should do more.

Concerns have been raised over rushed testing timelines and the lack of a safety report for GPT-4.1, which was launched this week instead of being accompanied by transparency documentation.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

xAI pushes Grok forward with memory update

Elon Musk’s AI venture, xAI, has introduced a new ‘memory’ feature for its Grok chatbot in a bid to compete more closely with established rivals like ChatGPT and Google’s Gemini.

The update allows Grok to remember details from past conversations, enabling it to provide more personalised responses when asked for advice or recommendations, instead of offering generic answers.

Unlike before, Grok can now ‘learn’ a user’s preferences over time, provided it’s used frequently enough. The move mirrors similar features from competitors, with ChatGPT already referencing full chat histories and Gemini using persistent memory to shape its replies.

According to xAI, the memory is fully transparent. Users can view what Grok has remembered and choose to delete specific entries at any time.

The memory function is currently available in beta on Grok’s website and mobile apps, although not yet accessible to users in the EU or UK.

Instead of being automatically enabled, it can be turned off in the settings menu under Data Controls. Deleting individual memories is also possible via the web chat interface, with Android support expected shortly.

xAI has confirmed it is working on adding memory support to Grok’s version on X. However, this expansion aims to deepen the bot’s integration with users’ digital lives instead of limiting the experience to one platform.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

New Apple AI model uses private email comparisons

Apple has outlined a new approach to improving its AI features by privately analysing user data with the help of synthetic data. The move follows criticism of the company’s AI products, especially notification summaries, which have underperformed compared to competitors.

The new method relies on ‘differential privacy,’ where Apple generates synthetic messages that resemble real user data without containing any actual content.

These messages are used to create embeddings—abstract representations of message characteristics—which are then compared with real emails on user’ devices that have opted in to share analytics.

Devices send back signals indicating which synthetic data most closely matches real content, without sharing the actual messages with Apple.

Apple said the technique is already being used to improve its Genmoji models and will soon be applied to other features, including Image Playground, Image Wand, Memories Creation, Writing Tools, and Visual Intelligence.

The company also confirmed plans to improve email summaries using the same privacy-focused method, aiming to refine its AI tools while maintaining a strong commitment to user data protection.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Claude can now read your Gmail and Docs

Anthropic has introduced a new integration that allows its AI chatbot, Claude, to connect directly with Google Workspace.

The feature, now in beta for premium subscribers, enables Claude to reference content from Gmail, Google Calendar, and Google Docs to deliver more personalised and context-aware responses.

Users can expect in-line citations showing where specific information originated from within their Google account.

This integration is available for subscribers on the Max, Team, Enterprise, and Pro plans, though multi-user accounts require administrator approval.

While Claude can read emails and review documents, it cannot send emails or schedule events. Anthropic insists the system uses strict access controls and does not train its models on user data by default.

The update arrives as part of Anthropic’s broader efforts to enhance Claude’s appeal in a competitive AI landscape.

Alongside the Workspace integration, the company launched Claude Research, a tool that performs real-time web searches to provide fast, in-depth answers.

Although still smaller than ChatGPT’s user base, Claude is steadily growing, reaching 3.3 million web users in March 2025.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

OpenAI updates safety rules amid AI race

OpenAI has updated its Preparedness Framework, the internal system used to assess AI model safety and determine necessary safeguards during development.

The company now says it may adjust its safety standards if a rival AI lab releases a ‘high-risk’ system without similar protections, a move that reflects growing competitive pressure in the AI industry.

Instead of outright dismissing such flexibility, OpenAI insists that any changes would be made cautiously and with public transparency.

Critics argue OpenAI is already lowering its standards for the sake of faster deployment. Twelve former employees recently supported a legal case against the company, warning that a planned corporate restructure might encourage further shortcuts.

OpenAI denies these claims, but reports suggest compressed safety testing timelines and increasing reliance on automated evaluations instead of human-led reviews. According to sources, some safety checks are also run on earlier versions of models, not the final ones released to users.

The refreshed framework also changes how OpenAI defines and manages risk. Models are now classified as having either ‘high’ or ‘critical’ capability, the former referring to systems that could amplify harm, the latter to those introducing entirely new risks.

Instead of deploying models first and assessing risk later, OpenAI says it will apply safeguards during both development and release, particularly for models capable of evading shutdown, hiding their abilities, or self-replicating.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Hertz customer data stolen in vendor cyberattack

Hertz has disclosed a significant data breach involving sensitive customer information, including credit card and driver’s licence details, following a cyberattack on one of its service providers.

The breach stemmed from vulnerabilities in the Cleo Communications file transfer platform, exploited in October and December 2024.

Hertz confirmed the unauthorised access on 10 February, with further investigations revealing a range of exposed data, including names, birth dates, contact details, and in some cases, Social Security and passport numbers.

While the company has not confirmed how many individuals were affected, notifications have been issued in the US, UK, Canada, Australia, and across the EU.

Hertz stressed that no misuse of customer data has been identified so far, and that the breach has been reported to law enforcement and regulators. Cleo has since patched the exploited vulnerabilities.

The identity of the attackers remains unknown. However, Cleo was previously targeted in a broader cyber campaign last October, with the Clop ransomware group later claiming responsibility.

The gang published Cleo’s company data online and listed dozens of breached organisations, suggesting the incident was part of a wider, coordinated effort.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

People are forming emotional bonds with AI chatbots

AI is reshaping how people connect emotionally, with millions turning to chatbots for companionship, guidance, and intimacy.

From virtual relationships to support with mental health and social navigation, personified AI assistants such as Replika, Nomi, and ChatGPT are being used by over 100 million people globally.

These apps simulate human conversation through personalised learning, allowing users to form what some consider meaningful emotional bonds.

For some, like 71-year-old Chuck Lohre from the US, chatbots have evolved into deeply personal companions. Lohre’s AI partner, modelled after his wife, helped him process emotional insights about his real-life marriage, despite elements of romantic and even erotic roleplay.

Others, such as neurodiverse users like Travis Peacock, have used chatbots to enhance communication skills, regulate emotions, and build lasting relationships, reporting a significant boost in personal and professional life.

While many users speak positively about these interactions, concerns persist over the nature of such bonds. Experts argue that these connections, though comforting, are often one-sided and lack the mutual growth found in real relationships.

A UK government report noted widespread discomfort with the idea of forming personal ties with AI, suggesting the emotional realism of chatbots may risk deepening emotional dependence without true reciprocity.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Opera brings AI assistant to Opera Mini on Android

Opera, the Norway-based browser maker, has announced the rollout of its AI assistant, Aria, to Opera Mini users on Android. The move represents a strategic effort to bring advanced AI capabilities to users with low-end devices and limited data access, rather than confining such tools to high-spec platforms.

Aria allows users to access up-to-date information, generate images, and learn about a range of topics using a blend of models from OpenAI and Google.

Since its 2005 launch, Opera Mini has been known for saving data during browsing, and Opera claims that the inclusion of Aria won’t compromise that advantage nor increase the app’s size.

It makes the AI assistant more accessible for users in regions where data efficiency is critical, instead of making them choose between smart features and performance.

Opera has long partnered with telecom providers in Africa to offer free data to Opera Mini users. However, last year, it had to end its programme in Kenya due to regulatory restrictions around ads on browser bookmark tiles.

Despite such challenges, Opera Mini has surpassed a billion downloads on Android and now serves more than 100 million users globally.

Alongside this update, Opera continues testing new AI functions, including features that let users manage tabs using natural language and tools that assist with task completion.

An effort like this reflects the company’s ambition to embed AI more deeply into everyday browsing instead of limiting innovation to its main browser.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Siri AI overhaul delayed until 2026

Apple has revealed plans to use real user data, in a privacy-preserving way, to improve its AI models. The company has acknowledged that synthetic data alone is not producing reliable results, particularly in training large language models that power tools like Writing Tools and notification summaries.

To address this, Apple will compare AI-generated content with real emails from users who have opted in to share Device Analytics. The sampled emails remain on the user’s device, with only a signal sent to Apple about which AI-generated message most closely matches real-world usage.

The move reflects broader efforts to boost the performance of Apple Intelligence, a suite of features that includes message recaps and content summaries.

Apple has faced internal criticism over slow progress, particularly with Siri, which is now seen as falling behind competitors like Google Gemini and Samsung’s Galaxy AI. The tech giant recently confirmed that meaningful AI updates for Siri won’t arrive until 2026, despite earlier promises of a rollout later this year.

In a rare leadership shakeup, Apple CEO Tim Cook removed AI chief John Giannandrea from overseeing Siri after delays were labelled ‘ugly and embarrassing’ by senior executives.

The responsibility for Siri’s future has been handed to Mike Rockwell, the creator of Vision Pro, who now reports directly to software chief Craig Federighi. Giannandrea will continue to lead Apple’s other AI initiatives.

For more information on these topics, visit diplomacy.edu.