Gemini leads latest ORCA benchmark on AI maths accuracy

A new round of the ORCA (Omni Research on Calculation in AI) benchmark reveals significant progress in how leading AI chatbots handle real-world mathematical problems, while also highlighting persistent limitations in reliability and consistency.

The latest results show Google’s Gemini 3 Flash moving clearly ahead of competing systems, correctly answering nearly three-quarters of the 500 practical questions used in the benchmark.

Our readers may recall that the platform previously analysed the first edition of the ORCA benchmark, examining how AI chatbots performed on everyday quantitative tasks rather than purely academic problems. The earlier analysis already showed notable gaps between systems and raised questions about the reliability of AI models for calculations people might encounter in daily life.

The second benchmark compares four widely accessible models: ChatGPT-5.2, Gemini 3 Flash, Grok-4.1 and DeepSeek V3.2. Gemini recorded the largest improvement, decisively outpacing the others. ChatGPT and DeepSeek posted smaller but steady gains, while Grok’s results declined slightly in several subject areas.

Performance improvements were uneven across domains, with Gemini showing particularly strong gains in fields such as biology, chemistry, physics and health-related calculations.

Closer examination of the errors reveals why AI still struggles with mathematical accuracy. Calculation mistakes have increased as a share of total errors, while rounding and formatting problems have decreased.

Researchers explain that large language models do not actually compute numbers in the same way that calculators do. Instead, they predict likely sequences of words and numbers, which can lead to small shortcuts during multi-step reasoning that eventually produce incorrect results.

The benchmark also highlights another challenge: instability. The same question can produce different answers when asked multiple times, even when the model initially responded correctly. Such variation reflects the probabilistic nature of AI systems.

As a result, the benchmark concludes that AI chatbots can assist with calculations but cannot yet match the consistency of traditional calculators, which always return the same answer for the same input.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

EU competition scrutiny pushes Meta to reopen WhatsApp AI access

Meta has announced that third-party AI chatbots will again be allowed to operate through WhatsApp in Europe, reversing restrictions introduced earlier this year.

The decision follows pressure from the European Commission, which had warned it could impose interim competition measures.

Earlier in 2026, Meta limited access to rival chatbot services on the messaging platform, prompting regulators to examine whether the move unfairly restricted competition in the rapidly expanding AI market.

WhatsApp remains one of the most widely used messaging applications across European countries, making platform access critical for emerging AI services.

Under the new arrangement, companies will be able to distribute general-purpose AI chatbots via the WhatsApp Business API for 12 months.

The change is intended to give European regulators time to complete their investigation while allowing competing AI services to operate within the platform ecosystem.

Meta has also indicated that businesses offering chatbots through WhatsApp will be required to pay fees to access the system.

The European Commission is now assessing whether these adjustments sufficiently address competition concerns surrounding the integration of AI services inside major digital platforms.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

OpenAI explains 5 AI value models transforming enterprise strategy

AI is beginning to reshape corporate strategy as organisations shift from isolated technology experiments to broader operational transformation.

According to OpenAI, businesses that treat AI as a collection of disconnected pilots risk missing the bigger structural change that the technology enables.

A new framework describes five value models through which AI can gradually reshape companies. The first stage focuses on workforce empowerment, where tools such as ChatGPT spread AI capabilities across teams and improve everyday productivity.

Once employees develop fluency, organisations can introduce AI-native distribution models that transform how customers discover products and interact with digital services.

More advanced stages involve specialised systems. Expert capability integrates AI into research, creative production, and domain-specific analysis, allowing professionals to explore a wider range of ideas and experiments.

Meanwhile, systems and dependency management introduce AI tools capable of safely updating interconnected digital environments, including codebases, documentation, and operational processes.

The final stage involves full process re-engineering through autonomous agents. In such environments, AI systems coordinate complex workflows across departments while maintaining governance, accountability, and auditability.

Organisations that successfully progress through these stages may eventually redesign their business models rather than merely improving efficiency within existing structures.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

Sovereign AI becomes a strategic question for governments

Governments across the world are increasingly treating AI as a strategic capability that shapes economic development, public services and national security. Momentum behind the idea of ‘sovereign AI’ is growing as countries reassess who controls the chips, cloud infrastructure, data and models powering modern technology.

Complete control over the entire AI stack remains unrealistic for most economies because of the enormous financial and technological costs involved. Global infrastructure continues to rely heavily on US technology firms, which still operate a large share of data centres and AI systems worldwide.

Policy makers are therefore exploring different approaches to sovereignty across the AI ecosystem rather than pursuing total independence. Strategies range from building domestic computing capacity to adapting global AI models for national languages, regulations and public services.

Several countries already illustrate different approaches. The EU is investing billions in AI infrastructure, Canada protects sensitive computing resources while using global models, and India prioritises applications that serve its multilingual population through public digital systems.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

AI adoption and jobs debated at India summit

Governments, companies and international organisations gathered in India in February for the AI Impact Summit to discuss the future of AI governance and adoption. Participants in India focused on economic impacts, labour market changes and sector specific uses of AI.

Delegates in India also highlighted growing interest in international cooperation on AI governance. Ninety one countries endorsed a declaration supporting shared tools, global collaboration and people centred development of AI.

Language diversity became a central topic during discussions in India. India’s government announced eight foundation AI models designed to support generative AI across the country’s 22 recognised languages.

Debate in India also reflected the growing influence of the Global South in AI policy discussions. Policymakers and experts in India emphasised infrastructure gaps, language diversity and local economic realities shaping AI adoption.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

ECB reports minor impact of AI on employment

AI has so far had only a small effect on employment across Europe, according to economists at the European Central Bank. A comparison of 5,000 firms- both AI users and non-users- showed no significant difference in job creation or reduction.

Some firms that use AI intensively were even four percent more likely to hire new staff than average.

Economists noted that AI investment has not replaced existing jobs. In some cases, firms are hiring additional employees to develop and implement AI systems or to scale up operations more efficiently.

Only a minority of firms, around 15 percent, reported reducing labour costs as a motivation for AI adoption.

Despite limited impacts so far, the ECB cautioned that AI could have more significant effects as technology matures. Firms that specifically invest in AI to cut jobs may indeed reduce employment, and the long-term consequences for production processes and labour markets remain uncertain.

The findings come amid rising concern over AI-driven job losses, with companies such as Amazon and Allianz citing AI as a reason for recent cuts. Markets reacted negatively last week after a viral post predicted widespread layoffs, though current evidence shows only minor effects.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Growing risks from AI meeting transcription tools

Businesses across the US and Europe are confronting new privacy risks as AI transcription tools spread through workplaces. Tools that automatically record and transcribe meetings increasingly capture sensitive conversations without clear consent.

Privacy specialists warn that organisations in the US and Europe previously focused on rules controlling what employees upload into AI systems. Governance efforts now shift towards monitoring what AI tools record during daily work.

AI services such as Otter, Zoom transcription and Microsoft Copilot can record discussions involving performance reviews, health information and legal matters. Companies in the US and Europe face legal exposure when third-party platforms store recordings without strict controls.

Governance teams in the US and Europe are being urged to introduce clear rules on meeting recordings and retention of transcripts. Stronger policies may include consent requirements, limits on recording sensitive meetings and stricter data storage oversight.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Qualcomm pushes Europe to take the lead in the 6G revolution

Europe is being urged to take a leading role in developing sixth-generation wireless technology as global competition intensifies over the future of connectivity and AI.

Speaking at the Mobile World Congress in Barcelona, Wassim Chourbaji of Qualcomm argued that 6G will represent a technological revolution rather than a gradual improvement over existing networks.

The company expects early pre-commercial deployments to begin around 2028, with broader commercialisation targeted for 2029.

Next-generation wireless networks are expected to support physical AI systems capable of interacting with the real world, including robotics, smart glasses, connected vehicles, and advanced sensing technologies.

High-capacity uploads and faster processing between devices and data centres will allow AI systems to analyse video streams and real-time data more efficiently.

Qualcomm has also launched a coalition aimed at accelerating 6G development with partners including Nokia, Ericsson, Amazon, Google and Microsoft.

Advocates argue that combining European industrial strengths with advanced wireless and AI technologies could allow the continent to secure a leading position in the next phase of global digital infrastructure.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!  

China expands oversight of youth online safety

China has introduced new measures to regulate online information that could affect the physical and mental health of minors. Authorities in China said the rules will take effect on 1 March and aim to improve protection for young internet users.

The regulators identified four categories of online information that may harm minors. The authorities have also addressed emerging risks linked to algorithmic recommendations and generative AI technologies.

The framework in China requires internet platforms and content creators to prevent and respond to harmful material. Regulators said companies must strengthen the monitoring and governance of content affecting minors.

Authorities said the measures are designed to create a cleaner online environment for children. Officials also stressed greater responsibility for platforms that manage digital content used by minors.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

X suspends creators over undisclosed AI armed conflict videos

Social media platform X will suspend creators from its revenue-sharing programme if they post AI-generated videos of armed conflict without proper disclosure. The penalty lasts 90 days, with permanent removal for repeat violations.

Head of product Nikita Bier said access to authentic information during war is critical, warning that generative AI makes it easy to mislead audiences. The policy takes effect immediately.

Enforcement will combine generative AI detection tools with the platform’s Community Notes fact-checking system. X, formerly Twitter, says the move is designed to prevent creators from profiting from deceptive conflict content.

The Creator Revenue Sharing Programme allows paid X subscribers to earn advertising income from high-performing posts, but critics argue it encourages sensational material. AI-generated political misinformation and deceptive influencer promotions outside armed conflict scenarios remain unaffected by the new rule.

Financial penalties may limit incentives for the dissemination of misleading war footage, yet broader concerns about AI-driven misinformation on social media persist.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot