MIT study finds AI chatbots underperform for vulnerable users

Research from the MIT Centre for Constructive Communication (CCC) finds that leading AI chatbots often provide lower-quality responses to users with lower English proficiency, less education, or who are outside the US.

Models tested include GPT-4, Claude 3 Opus, and Llama 3, which sometimes refuse to answer or respond condescendingly. Using TruthfulQA and SciQ datasets, researchers added user biographies to simulate differences in education, language, and country.

Accuracy fell sharply among non-native English speakers and less-educated users, with the most significant drop among those affected by both; users from countries like Iran also received lower-quality responses.

Refusal behaviour was notable. Claude 3 Opus declined 11% of questions for less-educated, non-native English speakers versus 3.6% for control users. Manual review showed 43.7% of refusals contained condescending language.

Some users were denied access to specific topics even though they answered correctly for others.

The study echoes human sociocognitive biases, in which non-native speakers are often perceived as less competent. Researchers warn AI personalisation could worsen inequities, providing marginalised users with subpar or misleading information when they need it most.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Gemini 3.1 Pro brings advanced logic to developers and consumers

Google has launched Gemini 3.1 Pro, an upgraded AI model for solving complex science, research, and engineering challenges. Following the Gemini 3 Deep Think release, the update adds enhanced core reasoning for consumer, developer, and enterprise applications.

Developers can access 3.1 Pro in preview via the Gemini API, Google AI Studio, Gemini CLI, Antigravity, and Android Studio, while enterprise users can use it through Vertex AI and Gemini Enterprise.

Consumers can now try the upgrade through the Gemini app and NotebookLM, with higher limits for Google AI Pro and Ultra plan users.

Benchmarks show significant improvements in logic and problem-solving. On the ARC-AGI-2 benchmark, 3.1 Pro scored 77.1%, more than doubling the reasoning performance of its predecessor.

The upgrade is intended to make AI reasoning more practical, offering tools to visualise complex topics, synthesise data, and enhance creative projects.

Feedback from Gemini 3 Pro users has driven the rapid development of 3.1 Pro. The preview release allows Google to validate improvements and continue refining advanced agentic workflows before the model becomes widely available.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

UK sets 48-hour deadline for removing intimate images

The UK government plans to require technology platforms to remove intimate images shared without consent within forty-eight hours instead of allowing such content to remain online for days.

Through an amendment to the Crime and Policing Bill, firms that fail to comply could face fines amounting to ten percent of their global revenue or risk having their services blocked in the UK.

A move that reflects ministers’ commitment to treat intimate image abuse with the same seriousness as child sexual abuse material and extremist content.

The action follows mounting concern after non-consensual sexual deepfakes produced by Grok circulated widely, prompting investigations by Ofcom and political pressure on platforms owned by Elon Musk.

The government now intends victims to report an image once instead of repeating the process across multiple services. Once flagged, the content should disappear across all platforms and be blocked automatically on future uploads through hash-matching or similar detection tools.

Ministers also aim to address content hosted outside the reach of the Online Safety Act by issuing guidance requiring internet providers to block access to sites that refuse to comply.

Keir Starmer, Liz Kendall and Alex Davies-Jones emphasised that no woman should be forced to pursue platform after platform to secure removal and that the online environment must offer safety and respect.

The package of reforms forms part of a broader pledge to halve violence against women and girls during the next decade.

Alongside tackling intimate image abuse, the government is legislating against nudification tools and ensuring AI chatbots fall within regulatory scope, using this agenda to reshape online safety instead of relying on voluntary compliance from large technology firms.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

Summit in India hears call for safe AI

The UN Secretary General has warned that AI must augment human potential rather than replace it, speaking at the India AI Impact Summit in New Delhi. Addressing leaders at Bharat Mandapam in New Delhi, he urged investment in workers so that technology strengthens, rather than displaces, human capacity.

In New Delhi, he cautioned that AI could deepen inequality, amplify bias and fuel harm if left unchecked. He called for stronger safeguards to protect people from exploitation and insisted that no child should be exposed to unregulated AI systems.

Environmental concerns also featured prominently in New Delhi, with Guterres highlighting rising energy and water demands from data centres. He urged a shift to clean power and warned against transferring environmental costs to vulnerable communities.

The UN chief proposed a $3 billion Global Fund on AI to build skills, data access and affordable computing worldwide. In New Delhi, he argued that broader access is essential to prevent countries from being excluded from the AI age and to ensure AI supports sustainable development goals.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Microsoft outlines challenges in verifying AI-generated media

In an era of deepfakes and AI-manipulated content, determining what is real online has become increasingly complex. Microsoft’s report Media Integrity and Authentication reviews current verification methods, their limits, and ways to boost trust in digital media.

The study emphasises that no single solution can prevent digital deception. Techniques such as provenance tracking, watermarking, and digital fingerprinting can provide useful context about a media file’s origin, creation tools, and whether it has been altered.

Microsoft has pioneered these technologies, cofounding the Coalition for Content Provenance and Authenticity (C2PA) to standardise media authentication globally.

The report also addresses the risks of sociotechnical attacks, where even subtle edits can manipulate authentication results to mislead the public.

Researchers explored how provenance information can remain durable and reliable across different environments, from high-security systems to offline devices, highlighting the challenge of maintaining consistent verification.

As AI-generated or edited content becomes commonplace, secure media provenance is increasingly important for news outlets, public figures, governments, and businesses.

Reliable provenance helps audiences spot manipulated content, with ongoing research guiding clearer, practical verification displays for the public.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Reload launches Epic to bring shared memory and structure to AI agents

Founders of the Reload platform say AI is moving from simple automation toward something closer to teamwork.

Newton Asare and Kiran Das noticed that AI agents were completing tasks normally handled by employees, which pushed them to design a system that treats digital workers as part of a company’s structure instead of disposable tools.

Their platform, Reload, offers a way for organisations to manage these agents across departments, assign responsibilities and monitor performance. The firm has secured 2.275 million dollars in new funding led by Anthemis with several other investors joining the round.

The shift toward agent-driven development exposed a recurring limitation. Most agents retain only short-term memory, which means they often lose context about a product or forget why a task matters.

Reload’s answer is Epic, a new product built on its platform that acts as an architect alongside coding agents. Epic defines requirements and constraints at the start of a project, then continuously preserves the shared understanding that agents need as software evolves.

Epic integrates with popular AI-assisted code editors such as Cursor and Windsurf, allowing developers to keep a consistent system memory without changing their workflow.

The tool generates key project artefacts from the outset, including data models and technical decisions, then carries them forward even when teams switch agents. It creates a single source of truth so that engineers and digital workers develop against the same structure.

Competing systems such as LongChain and CrewAI also offer support for managing agents, but Reload argues that Epic’s ability to maintain project-level context sets it apart.

Asare and Das, who already built and sold a previous company together, plan to use the fresh capital to grow their team and expand the infrastructure needed for a future in which human workers manage AI employees instead of the other way around.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

UNESCO expands multilingual learning through LearnBig

The LearnBig digital application is expanding access to learning, with UNESCO supporting educational materials in national and local languages instead of relying solely on dominant teaching languages.

A project that aligns with International Mother Language Day and reflects long-standing research showing that children learn more effectively when taught in languages they understand from an early age.

The programme supports communities along the Thailand–Myanmar border, where children gain literacy and numeracy skills in both Thai and their mother tongues.

Young learners can make more substantial academic progress with this approach, which allows them to remain connected to their cultural identity rather than being pushed into unfamiliar linguistic environments. More than 2,000 digital books are available in languages such as Karen, Myanmar, and Pattani Malay.

LearnBig was developed within the ‘Mobile Literacy for Out-of-School Children’ programme, backed by partners including Microsoft, True Corporation, POSCO 1% Foundation and the Ministry of Education of Thailand.

An initiative by UNESCO that has reached more than 526,000 learners, with young people in Yala using tablets to access digital books, while learners in Mae Hong Son study through content presented in their local languages.

The project illustrates the potential of digital innovation to bridge linguistic, social, and geographic divides.

By supporting children who often fall outside formal education systems, LearnBig demonstrates how technology can help build a more inclusive and equitable learning environment rather than reinforcing existing barriers.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

Reliance and OpenAI bring AI search to JioHotstar

OpenAI has joined forces with Reliance Industries to introduce conversational search into JioHotstar.

The integration uses OpenAI’s API so viewers can look for films, series, and live sports through multilingual text or voice prompts, receiving recommendations shaped by their viewing patterns instead of basic keyword results.

A collaboration that extends beyond the platform itself, with plans to surface JioHotstar suggestions directly inside ChatGPT.

The approach presents a two-way discovery layer that links entertainment browsing with conversational queries, pointing toward a new model for how audiences engage with streaming catalogues.

OpenAI is strengthening its footprint in India, where more than 100 million people now use ChatGPT weekly. The company intends to open offices in Mumbai and Bengaluru to support the expansion, adding to its site in New Delhi.

The partnership was announced at the India AI Impact Summit, where Sam Altman appeared alongside industry figures such as Dario Amodei and Sundar Pichai.

A move that aligns with a broader ‘OpenAI for India’ strategy that includes work on data centres with the Tata Group and further collaborations with companies such as Pine Labs, Eternal, and MakeMyTrip.

Executives from both sides said conversational interfaces will reshape how people find and follow programming, helping users navigate entertainment in a more natural way instead of relying on conventional menus.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

Conversational AI comes to YouTube TV

YouTube is testing its conversational AI feature on smart TVs, gaming consoles, and streaming devices. The tool, previously available on mobile and desktop, appears as an Ask button marked with a Gemini sparkle icon.

The feature allows viewers to ask questions about videos, request summaries, receive related content suggestions, and select from prompts displayed on screen. Users can press the microphone button on their remote to interact with the AI while watching.

Currently, the tool is available to a limited group of users, on select videos, and supports English, Hindi, Spanish, Portuguese, and Korean. YouTube has not revealed when it will expand access to more users or regions.

By bringing conversational AI to TVs, YouTube aims to make viewing more interactive. Fans can now get answers or clarifications directly on the big screen without needing a phone or computer.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

AI agent autonomy rises as users gain trust in Anthropic’s Claude Code

A new study from Anthropic offers an early picture of how people allow AI agents to work independently in real conditions.

By examining millions of interactions across its public API and its coding agent Claude Code, the company explored how long agents operate without supervision and how users change their behaviour as they gain experience.

The analysis shows a sharp rise in the longest autonomous sessions, with top users permitting the agent to work for more than forty minutes instead of cutting tasks short.

Experienced users appear more comfortable letting the AI agent proceed on its own, shifting towards auto-approve instead of checking each action.

At the same time, these users interrupt more often when something seems unusual, which suggests that trust develops alongside a more refined sense of when oversight is required.

The agent also demonstrates its own form of caution by pausing to ask for clarification more frequently than humans interrupt it as tasks become more complex.

The research identifies a broad spread of domains that rely on agents, with software engineering dominating usage but early signs of adoption emerging in healthcare, cybersecurity and finance.

Most actions remain low-risk and reversible, supported by safeguards such as restricted permissions or human involvement instead of fully automated execution. Only a tiny fraction of actions reveal irreversible consequences such as sending messages to external recipients.

Anthropic notes that real-world autonomy remains far below the potential suggested by external capability evaluations, including those by METR.

The company argues that safer deployment will depend on stronger post-deployment monitoring systems and better design for human-AI cooperation so that autonomy is managed jointly rather than granted blindly.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!