Gemini leads latest ORCA benchmark on AI maths accuracy

A new round of the ORCA (Omni Research on Calculation in AI) benchmark reveals significant progress in how leading AI chatbots handle real-world mathematical problems, while also highlighting persistent limitations in reliability and consistency.

The latest results show Google’s Gemini 3 Flash moving clearly ahead of competing systems, correctly answering nearly three-quarters of the 500 practical questions used in the benchmark.

Our readers may recall that the platform previously analysed the first edition of the ORCA benchmark, examining how AI chatbots performed on everyday quantitative tasks rather than purely academic problems. The earlier analysis already showed notable gaps between systems and raised questions about the reliability of AI models for calculations people might encounter in daily life.

The second benchmark compares four widely accessible models: ChatGPT-5.2, Gemini 3 Flash, Grok-4.1 and DeepSeek V3.2. Gemini recorded the largest improvement, decisively outpacing the others. ChatGPT and DeepSeek posted smaller but steady gains, while Grok’s results declined slightly in several subject areas.

Performance improvements were uneven across domains, with Gemini showing particularly strong gains in fields such as biology, chemistry, physics and health-related calculations.

Closer examination of the errors reveals why AI still struggles with mathematical accuracy. Calculation mistakes have increased as a share of total errors, while rounding and formatting problems have decreased.

Researchers explain that large language models do not actually compute numbers in the same way that calculators do. Instead, they predict likely sequences of words and numbers, which can lead to small shortcuts during multi-step reasoning that eventually produce incorrect results.

The benchmark also highlights another challenge: instability. The same question can produce different answers when asked multiple times, even when the model initially responded correctly. Such variation reflects the probabilistic nature of AI systems.

As a result, the benchmark concludes that AI chatbots can assist with calculations but cannot yet match the consistency of traditional calculators, which always return the same answer for the same input.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

Debate grows over the future of privacy

Experts gathered in London, UK, to examine how the concept of privacy has evolved over centuries. Discussions in London, UK, highlighted that privacy was only widely recognised as a legal and social norm after the Second World War.

Speakers in London noted that earlier societies often viewed privacy with suspicion or did not recognise it at all. Historical examples discussed included practices from Roman society and the French monarchy.

Modern legal protections expanded rapidly in recent decades, with privacy laws now covering about 80 percent of the global population. Scholars said the concept remains relatively new despite its central role in modern democracies.

The debate also explored whether privacy will remain a stable social value as technology evolves. Analysts in London said emerging technologies such as AI are reshaping debates over personal data and surveillance.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

OpenAI explains 5 AI value models transforming enterprise strategy

AI is beginning to reshape corporate strategy as organisations shift from isolated technology experiments to broader operational transformation.

According to OpenAI, businesses that treat AI as a collection of disconnected pilots risk missing the bigger structural change that the technology enables.

A new framework describes five value models through which AI can gradually reshape companies. The first stage focuses on workforce empowerment, where tools such as ChatGPT spread AI capabilities across teams and improve everyday productivity.

Once employees develop fluency, organisations can introduce AI-native distribution models that transform how customers discover products and interact with digital services.

More advanced stages involve specialised systems. Expert capability integrates AI into research, creative production, and domain-specific analysis, allowing professionals to explore a wider range of ideas and experiments.

Meanwhile, systems and dependency management introduce AI tools capable of safely updating interconnected digital environments, including codebases, documentation, and operational processes.

The final stage involves full process re-engineering through autonomous agents. In such environments, AI systems coordinate complex workflows across departments while maintaining governance, accountability, and auditability.

Organisations that successfully progress through these stages may eventually redesign their business models rather than merely improving efficiency within existing structures.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

Data centres’ expansion in London sparks energy and climate debate

London authorities are drafting new data centre policies amid concerns about their environmental impact and rising energy use. City Hall aims to balance the sector’s economic advantages with pressures on electricity, water, and emissions.

The Greater London Authority (GLA) estimates that 10 large data centres generate around 2.7 million tonnes of carbon emissions due to their high electricity consumption. Of the 100 data centres the UK plans, about 60 will be in London.

Megan Life, assistant director for environment and energy at the GLA, told the London Assembly Environment Committee the new strategy aims to ‘keep hold of the kind of economic growth benefits that data centres offer’ while addressing some ‘quite challenging’ impacts linked to their energy use.

Deputy mayor for environment Mete Coban said the expansion of data centres brings both ‘big benefits’ and ‘massive challenges’ for the capital, particularly in terms of energy and water consumption. ‘It’s not just a London problem, it’s going to be a global problem,’ he said, adding: ‘It’s about making sure that our environment doesn’t suffer in the hands of a few global corporations who will take and not give back, so we want to make sure we equitably do this.’

Policymakers are assessing how data centre growth may affect climate goals and urban infrastructure. London Mayor Sadiq Khan has commissioned a study to forecast future expansion. At the same time, UK lawmakers have launched an inquiry into the environmental impact of the sector as demand for cloud computing and AI infrastructure grows.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

Global AI race intensifies as China claims leadership in strategic technologies

China asserted its position as the global leader in AI and strategic technology R&D, pledging to accelerate advancement toward technological autonomy. The assertion was prominently featured in government reports presented to the National People’s Congress.

A National Development and Reform Commission report states that China leads international research, development, and implementation in AI, biomedicine, robotics, and quantum technology. The report also references advancements in domestic chip innovation as proof of progress.

Competition between China and the United States for dominance in advanced technologies has escalated. Washington imposed export controls on advanced chips, while Beijing retaliated with restrictions on rare earth resources, escalating trade tensions over strategic technologies.

The report also highlighted the country’s global leadership in open-source AI models and its expansion into emerging technology sectors, including industrial robots and drones. Authorities pledged to nurture future industries such as quantum technology, embodied AI, and 6G networks, while promoting large-scale AI deployment across key sectors.

Officials also plan to launch new data centres, coordinate nationwide computing capacity, and establish mechanisms to prevent AI security risks. The strategy places particular emphasis on embodied AI to boost productivity and performance across sectors. Although US firms command larger investment resources, Beijing is relying on supply chains, manufacturing capacity, and rapid R&D cycles to scale emerging industries despite questions about long-term growth.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

UK to launch new lab for breakthrough AI research

Researchers in the UK will gain a new AI lab designed to drive transformational breakthroughs in healthcare, transport, science, and everyday technology, supported by government funding.

The lab will provide up to £40 million in funding over six years, alongside substantial access to large-scale computing resources, inviting UK researchers to pitch their most ambitious ideas.

The Fundamental AI Research Lab will focus on tackling core AI challenges, including hallucinations, unreliable memory, and unpredictable reasoning.

The lab will support high-risk, blue-sky research rather than simply scaling existing systems. Its goal is to unlock entirely new capabilities that could improve medical diagnoses, infrastructure resilience, scientific discovery, and public services.

UK officials highlighted the country’s strength in world-class universities, AI talent, and a thriving sector attracting over £100 billion in private investment. Experts, including Raia Hadsell of Google DeepMind, will peer-review funding applications, prioritising bold, high-reward proposals.

The initiative is part of the UKRI AI Strategy, which is backed by £1.6 billion and aims to strengthen research and ensure AI benefits society and the economy. UK AI projects like RADAR for rail faults and the IXI Brain Atlas for Alzheimer’s research demonstrate the approach’s potential impact.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

ECB reports minor impact of AI on employment

AI has so far had only a small effect on employment across Europe, according to economists at the European Central Bank. A comparison of 5,000 firms- both AI users and non-users- showed no significant difference in job creation or reduction.

Some firms that use AI intensively were even four percent more likely to hire new staff than average.

Economists noted that AI investment has not replaced existing jobs. In some cases, firms are hiring additional employees to develop and implement AI systems or to scale up operations more efficiently.

Only a minority of firms, around 15 percent, reported reducing labour costs as a motivation for AI adoption.

Despite limited impacts so far, the ECB cautioned that AI could have more significant effects as technology matures. Firms that specifically invest in AI to cut jobs may indeed reduce employment, and the long-term consequences for production processes and labour markets remain uncertain.

The findings come amid rising concern over AI-driven job losses, with companies such as Amazon and Allianz citing AI as a reason for recent cuts. Markets reacted negatively last week after a viral post predicted widespread layoffs, though current evidence shows only minor effects.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Gemini Canvas reaches millions as Google expands AI Search tools

Google has expanded access to the Canvas feature in Google Search’s AI Mode, making it available to all US users.

Canvas allows users to organise research, draft documents and develop small applications directly inside search.

Prompts can generate code, transform reports into webpages or quizzes, and produce audio summaries from uploaded material. The tool was previously introduced as part of experimental projects in Google Labs.

The feature builds on capabilities already available in Google Gemini and partly overlaps with NotebookLM, which supports research analysis and document processing.

Within Canvas, users can gather information from the web and the Google Knowledge Graph while refining projects through interaction with the Gemini model.

Competition is intensifying across AI development platforms. OpenAI and Anthropic offer similar tools, though their design approaches differ in how collaborative workspaces are triggered and used.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!  

OpenAI upgrades ChatGPT conversations with GPT-5.3 Instant

The most widely used ChatGPT model has received an update from OpenAI, introducing GPT-5.3 Instant to make everyday conversations more coherent, useful, and natural.

An upgrade that focuses on improving tone, contextual understanding, and the flow of dialogue rather than only benchmark performance.

One of the main improvements concerns how the model handles refusals and safety responses. Earlier versions sometimes declined questions that could have been answered safely or delivered overly cautious explanations before responding.

GPT-5.3 Instant instead gives more direct answers while still maintaining safety constraints, reducing interruptions that previously slowed conversations.

The update also improves the way ChatGPT uses information from the web. Instead of simply summarising search results or presenting long lists of links, the model now integrates online information with its own reasoning.

Such an approach aims to produce more relevant answers that highlight key insights at the beginning of responses.

Reliability has also improved. Internal evaluations conducted by OpenAI show reductions in hallucination rates across multiple domains.

When using web sources, hallucinations dropped by roughly 26.8 percent in higher-risk fields such as medicine, law, and finance. Improvements were also recorded when the model relied only on its internal knowledge.

Beyond factual accuracy, the model is designed to feel more natural in conversation. OpenAI says the system now avoids overly preachy language, unnecessary disclaimers, and intrusive remarks that previously disrupted dialogue.

The goal is a more consistent conversational personality across updates, while maintaining the familiar user experience of ChatGPT.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!  

EU citizens propose public social media network under new initiative

The European Commission has registered a European Citizens’ Initiative proposing the creation of a public social media platform operating at the European level, rather than relying exclusively on private technology companies.

An initiative titled the European Public Social Network calls for legislation establishing a publicly funded digital platform designed to serve societal interests.

Organisers argue that a publicly owned network could function independently from commercial incentives and political pressure while guaranteeing equal rights for users across the EU. The proposed platform would operate as a public service overseen by society rather than private corporations.

Registration confirms that the proposal meets the legal requirements of the European Citizens’ Initiative framework. The Commission has not yet assessed the substance of the idea, and registration does not imply support for the proposal.

Supporters must now gather 1 million signatures from citizens across at least 7 EU member states within 12 months. If the threshold is reached, the Commission will be required to formally examine the initiative and decide whether legislative action is appropriate.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!