Best AI chatbot for maths accuracy revealed in new benchmark

AI tools are increasingly used for simple everyday calculations, yet a new benchmark suggests accuracy remains unreliable.

The ORCA study tested five major chatbots across 500 real-world maths prompts and found that users still face roughly a 40 percent chance of receiving the wrong answer.

Gemini from Google recorded the highest score at 63 percent, with xAI’s Grok almost level at 62.8 percent. DeepSeek followed with 52 percent, while ChatGPT scored 49.4 percent, and Claude placed last at 45.2 percent.

Performance varied sharply across subjects, with maths and conversion tasks producing the best results, but physics questions dragged scores down to an average accuracy below 40 percent.

Researchers identified most errors as sloppy calculations or rounding mistakes, rather than deeper failures to understand the problem. Finance and economics questions highlighted the widest gaps between the models, while DeepSeek struggled most in biology and chemistry, with barely one correct answer in ten.

Users are advised to double-check results whenever accuracy is crucial. A calculator or a verified source is still advised instead of relying entirely on an AI chatbot for numerical certainty.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

China proposes strict AI rules to protect children

China has proposed stringent new rules for AI aimed at protecting children and preventing chatbots from providing advice that could lead to self-harm, violence, or gambling.

The draft regulations, published by the Cyberspace Administration of China (CAC), require developers to include personalised settings, time limits, and parental consent for services offering emotional companionship.

High-risk chats involving self-harm or suicide must be passed to a human operator, with guardians or emergency contacts alerted. AI providers must not produce content that threatens national security, harms national honour, or undermines national unity.

The rules come as AI usage surges, with platforms such as DeepSeek, Z.ai, and Minimax attracting millions of users in China and abroad. The CAC supports safe AI use, including tools for local culture and elderly companionship.

The move reflects growing global concerns over AI’s impact on human behaviour. Notably, OpenAI has faced legal challenges over alleged chatbot-related harm, prompting the company to create roles focused on tracking AI risks to mental health and cybersecurity.

China’s draft rules signal a firm approach to regulating AI technology as its influence expands rapidly.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

AI cheating drives ACCA to halt online exams

The Association of Chartered Certified Accountants (ACCA) has announced it will largely end remote examinations in the UK from March 2026, requiring students to sit tests in person unless exceptional circumstances apply.

The decision aims to address a surge in cheating, particularly facilitated by AI tools.

Remote testing was introduced during the Covid-19 pandemic to allow students to continue qualifying when in-person exams were impossible. The ACCA said online assessments have now become too difficult to monitor effectively, despite efforts to strengthen safeguards against misconduct.

Investigations show cheating has impacted major auditing firms, including the ‘big four’ and other top companies. High-profile cases, such as EY’s $100m (£74m) settlement in the US, highlight the risks posed by compromised professional examinations.

While other accounting bodies, including the Institute of Chartered Accountants in England and Wales, continue to allow some online exams, the ACCA has indicated that high-stakes assessments must now be conducted in person to maintain credibility and integrity.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Agentic AI plans push US agencies to prioritise data reform

US federal agencies planning to deploy agentic AI in 2026 are being told to prioritise data organisation as a prerequisite for effective adoption. AI infrastructure providers say poorly structured data remains a major barrier to turning agentic systems into operational tools.

Public sector executives at Amazon Web Services, Oracle, and Cisco said government clients are shifting focus away from basic chatbot use cases. Instead, agencies are seeking domain-specific AI systems capable of handling defined tasks and delivering measurable outcomes.

US industry leaders said achieving this shift requires modernising legacy infrastructure alongside cleaning, structuring, and contextualising data. Executives stressed that agentic AI depends on high-quality data pipelines that allow systems to act autonomously within defined parameters.

Oracle said its public sector strategy for 2026 centres on enabling context-aware AI through updated data assets. Company executives argued that AI systems are only effective when deeply aligned with an organisation’s underlying data environment.

The companies said early agentic AI use cases include document review, data entry, and network traffic management. Cloud infrastructure was also highlighted as critical for scaling agentic systems and accelerating innovation across government workflows.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

New AI brain model mirrors lab animal behaviour without using animal data

A new computational brain model, built entirely from biological principles, has learned a visual categorisation task with accuracy and variability matching that of lab animals. Remarkably, the model achieved these results without being trained on any animal data.

The biomimetic design integrates detailed synaptic rules with large-scale architecture across the cortex, striatum, brainstem, and acetylcholine-modulated systems.

As the model learned, it reproduced neural rhythms observed in real animals, including strengthened beta-band synchrony during correct decisions. The result demonstrates emergent realism in both behaviour and underlying neural activity.

The model also revealed a previously unnoticed set of ‘incongruent neurons’ that predicted errors. When researchers revisited animal data, they found the same signals had gone undetected, highlighting the platform’s potential to uncover hidden neural dynamics.

Beyond neuroscience research, the model offers a powerful tool for testing neurotherapeutic interventions in silico. Simulating disease-related circuits allows scientists to test treatments before costly clinical trials, potentially speeding up the development of next-generation neurotherapeutics.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Hacker allegedly claims a major WIRED data breach affecting 2.3 million

A hacker using the name Lovely has allegedly claimed to have accessed subscriber data belonging to WIRED and to have leaked details relating to around 2.3 million users.

The same individual also states that a wider Condé Nast account system covering more than 40 million users could be exposed in future leaks instead of ending with the current dataset.

Security researchers are reported to have matched samples of the claimed leak with other compromised data sources. The information is said to include names, email addresses, user IDs and timestamps instead of passwords or payment information.

Some researchers also believe that certain home addresses could be included, which would raise privacy concerns if verified.

The dataset is reported to be listed on Have I Been Pwned. However, no official confirmation from WIRED or Condé Nast has been issued regarding the authenticity, scale or origin of the claimed breach, and the company’s internal findings remain unknown until now.

The hacker has also accused Condé Nast of failing to respond to earlier security warnings, although these claims have not been independently verified.

Users are being urged by security professionals to treat unexpected emails with caution instead of assuming every message is genuine.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

KT faces action in South Korea after a femtocell security breach impacts users

South Korea has blamed weak femtocell security at KT Corp for a major mobile payment breach that triggered thousands of unauthorised transactions.

Officials said the mobile operator used identical authentication certificates across femtocells and allowed them to stay valid for ten years, meaning any device that accessed the network once could do so repeatedly instead of being re-verified.

More than 22,000 users had identifiers exposed, and 368 people suffered unauthorised payments worth 243 million won.

Investigators also discovered that ninety-four KT servers were infected with over one hundred types of malware. Authorities concluded the company failed in its duty to deliver secure telecommunications services because its overall management of femtocell security was inadequate.

The government has now ordered KT to submit detailed prevention plans and will check compliance in June, while also urging operators to change authentication server addresses regularly and block illegal network access.

Officials said some hacking methods resembled a separate breach at SK Telecom, although there is no evidence that the same group carried out both attacks. KT said it accepts the findings and will soon set out compensation arrangements and further security upgrades instead of disputing the conclusions.

A separate case involving LG Uplus is being referred to police after investigators said affected servers were discarded, making a full technical review impossible.

The government warned that strong information security must become a survival priority as South Korea aims to position itself among the world’s leading AI nations.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

OpenAI strengthened ChatGPT Atlas with new protections against prompt injection attacks

Protecting AI agents from manipulation has become a top priority for OpenAI after rolling out a major security upgrade to ChatGPT Atlas.

The browser-based agent now includes stronger safeguards against prompt injection attacks, where hidden instructions inside emails, documents or webpages attempt to redirect the agent’s behaviour instead of following the user’s commands.

Prompt injection poses a unique risk because Atlas can carry out actions that a person would normally perform inside a browser. A malicious email or webpage could attempt to trigger data exposure, unauthorised transactions or file deletion.

Criminals exploit the fact that agents process large volumes of content across an almost unlimited online surface.

OpenAI has developed an automated red-team framework that uses reinforcement learning to simulate sophisticated attackers.

When fresh attack patterns are discovered, the models behind Atlas are retrained so that resistance is built into the agent rather than added afterwards. Monitoring and safety controls are also updated using real attack traces.

These new protections are already live for all Atlas users. OpenAI advises people to limit logged-in access where possible, check confirmation prompts carefully and give agents well-scoped tasks instead of broad instructions.

The company argues that proactive defence is essential as agentic AI becomes more capable and widely deployed.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

AI chatbots struggle with dialect fairness

Researchers are warning that AI chatbots may treat dialect speakers unfairly instead of engaging with them neutrally. Studies across English and German dialects found that large language models often attach negative stereotypes or misunderstand everyday expressions, leading to discriminatory replies.

A study in Germany tested ten language models using dialects such as Bavarian and Kölsch. The systems repeatedly described dialect speakers as uneducated or angry, and the bias became stronger when the dialect was explicitly identified.

Similar findings emerged elsewhere, including UK council services and AI shopping assistants that struggled with African American English.

Experts argue that such patterns risk amplifying social inequality as governments and businesses rely more heavily on AI. One Indian job applicant even saw a chatbot change his surname to reflect a higher caste, showing how linguistic bias can intersect with social hierarchy instead of challenging it.

Developers are now exploring customised AI models trained with local language data so systems can respond accurately without reinforcing stereotypes.

Researchers say bias can be tuned out of AI if handled responsibly, which could help protect dialect speakers rather than marginalise them.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!