Best AI chatbot for maths accuracy revealed in new benchmark

AI tools are increasingly used for simple everyday calculations, yet a new benchmark suggests accuracy remains unreliable.

The ORCA study tested five major chatbots across 500 real-world maths prompts and found that users still face roughly a 40 percent chance of receiving the wrong answer.

Gemini from Google recorded the highest score at 63 percent, with xAI’s Grok almost level at 62.8 percent. DeepSeek followed with 52 percent, while ChatGPT scored 49.4 percent, and Claude placed last at 45.2 percent.

Performance varied sharply across subjects, with maths and conversion tasks producing the best results, but physics questions dragged scores down to an average accuracy below 40 percent.

Researchers identified most errors as sloppy calculations or rounding mistakes, rather than deeper failures to understand the problem. Finance and economics questions highlighted the widest gaps between the models, while DeepSeek struggled most in biology and chemistry, with barely one correct answer in ten.

Users are advised to double-check results whenever accuracy is crucial. A calculator or a verified source is still advised instead of relying entirely on an AI chatbot for numerical certainty.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

China plans stricter consent rules for AI chat platforms

China is proposing new rules requiring users to consent before AI companies can use chat logs for training. The draft measures aim to balance innovation with safety and public interest.

Platforms would need to inform users when interacting with AI and provide options to access or delete their chat history. For minors, guardian consent is required before sharing or storing any data.

Analysts say the rules may slow AI chatbot improvements but provide guidance on responsible development. The measures signal that some user conversations are too sensitive for free training data.

The draft rules are open for public consultation with feedback due in late January. China encourages expanding human-like AI applications once safety and reliability are demonstrated.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

AI chatbots struggle with dialect fairness

Researchers are warning that AI chatbots may treat dialect speakers unfairly instead of engaging with them neutrally. Studies across English and German dialects found that large language models often attach negative stereotypes or misunderstand everyday expressions, leading to discriminatory replies.

A study in Germany tested ten language models using dialects such as Bavarian and Kölsch. The systems repeatedly described dialect speakers as uneducated or angry, and the bias became stronger when the dialect was explicitly identified.

Similar findings emerged elsewhere, including UK council services and AI shopping assistants that struggled with African American English.

Experts argue that such patterns risk amplifying social inequality as governments and businesses rely more heavily on AI. One Indian job applicant even saw a chatbot change his surname to reflect a higher caste, showing how linguistic bias can intersect with social hierarchy instead of challenging it.

Developers are now exploring customised AI models trained with local language data so systems can respond accurately without reinforcing stereotypes.

Researchers say bias can be tuned out of AI if handled responsibly, which could help protect dialect speakers rather than marginalise them.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

China’s AI sector accelerates after breakthrough year

China’s AI industry entered 2025 as a perceived follower but ended the year transformed. Rapid technical progress and commercial milestones reshaped global perceptions of Chinese innovation.

The surprise release of DeepSeek R1 demonstrated strong reasoning performance at unusually low training costs. Open access challenged assumptions about chip dominance and boosted adoption across emerging markets.

State backing and private capital followed quickly, lifting the AI’s sector valuations and supporting embodied intelligence projects. Leading model developers prepared IPO filings, signalling confidence in long term growth.

Chinese firms increasingly prioritised practical deployment, multilingual capability, and service integration. Global expansion now stresses cultural adaptation rather than raw technical benchmarks alone.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

AI can mislead on tides and outdoor safety

UK outdoor enthusiasts are warned not to rely solely on AI for tide times or weather. Errors recently stranded visitors on Sully Island, showing the limits of unverified information.

Maritime authorities recommend consulting official sources such as the UK Hydrographic Office and Met Office. AI tools may misread tables or local data, making human oversight essential for safety.

Mountain rescue teams report similar issues when inexperienced walkers used AI to plan trips. Even with good equipment, lack of judgement can turn minor errors into dangerous situations.

Practical experience, professional guidance, and verified data remain critical for safe outdoor activities. Relying on AI alone can create serious risks, especially on tidal beaches and challenging mountain routes.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

AI transforms Indian filmmaking

Filmmakers in India are rapidly adopting AI tools like ChatGPT, Midjourney and Stable Diffusion to create visuals, clone voices, and streamline production processes for both independent and large-scale films.

Low-budget directors now produce nearly entire films independently, reducing costs and production time. Filmmakers use AI to visualise scenes, experiment creatively, and plan sound and effects efficiently.

AI cannot fully capture cultural nuance, emotional depth, or storytelling intuition, so human oversight remains essential. Intellectual property, labour protections, and ethical issues remain unresolved.

Hollywood has resisted AI, with strikes over rights and labour concerns. Indian filmmakers, however, carefully combine AI tools with human creativity to preserve artistic vision and cultural nuance.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

AI reshaped European healthcare in 2025

Europe’s healthcare systems turned increasingly to AI in 2025, using new tools to predict disease, speed diagnosis, and reduce administrative workloads.

Countries including Finland, Estonia and Spain adopted AI to train staff, analyse medical data and detect illness earlier, while hospitals introduced AI scribes to free up doctors’ time with patients.

Researchers also advanced AI models able to forecast more than a thousand conditions many years before diagnosis, including heart disease, diabetes and certain cancers.

Further tools detected heart problems in seconds, flagged prostate cancer risks more quickly and monitored patients recovering from stent procedures instead of relying only on manual checks.

Experts warned that AI should support clinicians rather than replace them, as doctors continue to outperform AI in emergency care and chatbots struggle with mental health needs.

Security specialists also cautioned that extremists could try to exploit AI to develop biological threats, prompting calls for stronger safeguards.

Despite such risks, AI-driven approaches are now embedded across European medicine, from combating antibiotic-resistant bacteria to streamlining routine paperwork. Policymakers and health leaders are increasingly focused on how to scale innovation safely instead of simply chasing rapid deployment.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

SK Telecom introduces South Korea’s first hyperscale AI model

The telecommunications firm, SK Telecom, is preparing to unveil A.X K1, Korea’s first hyperscale language model built with 519 billion parameters.

Around 33 billion parameters are activated during inference, so the AI model can keep strong performance instead of demanding excessive computing power. The project is part of a national initiative involving universities and industry partners.

The company expects A.X K1 to outperform smaller systems in complex reasoning, mathematics and multilingual understanding, while also supporting code generation and autonomous AI agents.

At such a scale, the model can operate as a teacher system that transfers knowledge to smaller, domain-specific tools that might directly improve daily services and industrial processes.

Unlike many global models trained mainly in English, A.X K1 has been trained in Korean from the outset so it naturally understands local language, culture and context.

SK Telecom plans to deploy the model through its AI service Adot, which already has more than 10 million subscribers, allowing access via calls, messages, the web and mobile apps.

The company foresees applications in workplace productivity, manufacturing optimisation, gaming dialogue, robotics and semiconductor performance testing.

Research will continue so the model can support the wider AI ecosystem of South Korea, and SK Telecom plans to open-source A.X K1 along with an API to help local developers create new AI agents.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

The AI terms that shaped debate and disruption in 2025

AI continued to dominate public debate in 2025, not only through new products and investment rounds, but also through a rapidly evolving vocabulary that captured both promise and unease.

From ambitious visions of superintelligence to cultural shorthand like ‘slop’, language became a lens through which society processed another turbulent year for AI.

Several terms reflected the industry’s technical ambitions. Concepts such as superintelligence, reasoning models, world models and physical intelligence pointed to efforts to push AI beyond text generation towards deeper problem-solving and real-world interaction.

Developments by companies including Meta, OpenAI, DeepSeek and Google DeepMind reinforced the sense that scale, efficiency and new training approaches are now competing pathways to progress, rather than sheer computing power alone.

Other expressions highlighted growing social and economic tensions. Words like hyperscalers, bubble and distillation entered mainstream debate as data centres expanded, valuations rose, and cheaper model-building methods disrupted established players.

At the same time, legal and ethical debates intensified around fair use, chatbot behaviour and the psychological impact of prolonged AI interaction, underscoring the gap between innovation speed and regulatory clarity.

Cultural reactions also influenced the development of the AI lexicon. Terms such as vibe coding, agentic and sycophancy revealed how generative systems are reshaping work, creativity and user trust, while ‘slop’ emerged as a blunt critique of low-quality, AI-generated content flooding online spaces.

Together, these phrases chart a year in which AI moved further into everyday life, leaving society to wrestle with what should be encouraged, controlled or questioned.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

How AI in 2026 will transform management roles and organisational design

In 2026, AI will transform management structures and automate tasks as companies strive to demonstrate real value. By 2026, AI is expected to move beyond experimentation and pilot projects and begin reshaping how companies are actually run.

According to researchers and professors at IMD, the focus will shift from testing AI tools to redesigning organisational structures, decision-making processes, and management roles themselves. After several years of hype-driven investment, many companies are now under pressure to show clear returns from AI.

Those that remain stuck in proof-of-concept mode risk falling behind competitors who are willing to make more significant operational changes. Several corporate functions are set to become AI native by the end of the year.

Human roles in these areas will focus more on interpersonal judgement, oversight and complex decision-making, while software forms the operational backbone. Workforce structures are also likely to change. Middle management roles are expected to shrink gradually as AI systems take over reporting, forecasting and coordination tasks.

At the same time, risks associated with AI are growing. Highly realistic synthetic media is expected to fuel a rise in misinformation, exposing organisations to reputational and governance challenges. To respond, companies will need faster monitoring systems, clearer crisis-response protocols and closer cooperation with digital platforms to counter fabricated content.

Economic uncertainty is adding further pressure. Organisations that remain stuck in pilot mode may be forced to scale back, while those committing to bigger operational change are expected to gain an advantage.

Operational areas are expected to deliver the highest returns on investment. Supply chains, core operations and internal processes are expected to outperform customer-facing applications in efficiency, resilience and cost reduction.

As a result, chief operating officers may emerge as the most influential leaders of AI within executive teams. Ultimately, by 2026, competitive advantage will depend less on whether a company uses advanced AI and more on how deliberately it integrates these systems into everyday decision-making, roles, and organisational structures.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot