Researchers at MIT have developed a new technique designed to improve how computer vision models explain their predictions while maintaining strong accuracy. Transparency is crucial as AI enters fields like healthcare and autonomous driving, where decisions must be clear.
The method uses concept bottleneck models, which enable AI to base its predictions on human-understandable concepts. Traditional approaches rely on expert-defined concepts that can be incomplete or ill-suited, sometimes lowering model performance.
Researchers instead created a system that extracts concepts the AI learned during training. A sparse autoencoder selects key features, and a multimodal language model turns them into plain-language descriptions and labels.
The resulting module forces the AI to make predictions using only those extracted concepts.
Tests on bird classification and medical image datasets showed that the new method improved accuracy and provided clearer explanations. Findings suggest that using a model’s internal concepts can boost transparency and accountability in AI systems.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
Redefining how people interact with technology, Lenovo is advancing through rollable laptops, foldable devices and adaptive AI systems that anticipate user needs.
The company is shifting from manufacturing hardware to creating multi-platform systems that adapt seamlessly to workflows instead of relying solely on traditional devices.
Qira, Lenovo’s personal AI super-agent, transfers tasks across devices while maintaining context and history with user permission. It can suggest actions and predict needs, aiming to improve productivity and employee satisfaction, although security and privacy concerns remain significant.
The rollable laptop features a 14-inch screen that expands vertically to 16.7 inches, providing immersive experiences for gaming and content consumption while remaining portable.
Lenovo is also exploring voice-driven tools, including AI Workmate prototypes, allowing users to create presentations and digital content simply through speech.
By combining innovative screen designs with intelligent AI agents, Lenovo aims to create unified ecosystems that prioritise user experience and adaptability instead of focusing solely on device specifications.
The company believes these technologies will gradually become culturally accepted, similar to self-driving cars.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
New Age-Restricted Material Codes have begun to be enforced in Australia, requiring online platforms to introduce stronger protections to prevent children from accessing harmful digital content.
The rules apply across a wide range of services, including social media, app stores, gaming platforms, search engines, pornography websites, and AI chatbots.
Under the framework, companies must implement age-assurance systems before allowing access to content involving pornography, high-impact violence, self-harm material, or other age-restricted topics.
These measures also extend to AI companions and chatbots, which must prevent sexually explicit or self-harm-related conversations with minors.
The rules form part of Australia’s broader online safety framework overseen by the eSafety Commissioner, which will monitor compliance and enforce the codes.
Companies that fail to comply may face penalties of up to $49.5 million per breach.
The policy aims to shift responsibility toward technology companies by requiring them to build protections directly into their platforms.
Officials in Australia argue the measures mirror long-standing offline safeguards designed to prevent children from accessing adult environments or harmful material.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
Courts across Europe are examining how copyright law applies to AI systems trained on large datasets. Judges in Europe are reviewing whether existing rules allow AI developers to use copyrighted books, music and journalism without permission.
One closely watched dispute in Luxembourg involves a publisher challenging Google over summaries produced by its Gemini chatbot. The case before the EU court in Luxembourg could test how press publishers’ rights apply to AI-generated outputs.
Legal experts warn the ruling in Luxembourg may not resolve wider questions about AI training data. Many disputes in Europe focus on the EU copyright directive and its text and data mining exception.
Additional lawsuits across Europe involving music rights group GEMA and OpenAI are expected to continue for years. Policymakers in Europe are also considering updates to copyright rules as AI technology expands.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
A dispute between Anthropic and the Pentagon in the US has raised questions about whether startups will hesitate to pursue defence contracts. Negotiations over the use of Anthropic’s Claude AI technology collapsed, prompting the US administration to label the company a supply chain risk.
The situation in the US escalated as OpenAI secured its own agreement with the Pentagon. The development sparked backlash online, with reports of a surge in ChatGPT uninstalls after the defence partnership announcement.
Technology analysts in the US say the controversy highlights the unusual scrutiny facing high-profile AI firms. Companies such as OpenAI and Anthropic attract intense public attention because widely used AI products place their defence partnerships in the spotlight.
Startup founders in the US are now debating the risks of government contracts, particularly with the Pentagon. Industry observers in the US warn that defence authorities’ contract changes could make government collaboration more uncertain.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
Cursor has launched a new tool called Automations, designed to help software engineers manage the growing complexity of overseeing multiple AI coding agents at once.
Rather than requiring a human to initiate each task, the system allows agents to launch automatically in response to events such as a new code addition, a Slack message, or a scheduled timer.
The shift is significant because it breaks the ‘prompt-and-monitor’ model that currently defines most AI-assisted engineering.
As Cursor’s engineering lead for asynchronous agents, Jonas Nelle put it, humans are no longer always the ones initiating; they are called in at the right moments, rather than tracking dozens of processes simultaneously.
Early applications include automated bug reviews, security audits, PagerDuty incident response, and weekly codebase summaries delivered to Slack.
The launch comes as competition in the agentic coding space intensifies, with both OpenAI and Anthropic releasing major updates to their tools in recent weeks. Cursor’s annual recurring revenue has nonetheless doubled over the past three months to more than $2 billion.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
AI systems rely heavily on human labour to train and improve algorithms. Images and videos collected by AI-powered devices are often reviewed and labelled by human annotators so that systems can better recognise objects, environments, and context.
This work is frequently outsourced to data annotation companies such as Sama, which provides training data services for large technology firms, including Meta Platforms. Many of these tasks are carried out by contract workers in Nairobi, Kenya, where employees review large volumes of visual data under strict confidentiality agreements.
Recent investigations have raised concerns about privacy and data governance linked to AI wearables such as the Ray-Ban Meta smart glasses, developed in partnership with EssilorLuxottica. Some device features rely on cloud processing, meaning that captured images and voice inputs may be transmitted and analysed remotely.
Workers involved in the annotation process report regularly encountering sensitive material. Footage can include scenes recorded inside private homes, bedrooms, or bathrooms, as well as images that unintentionally reveal personal or financial information.
These practices raise broader questions about transparency and cross-border data transfers, particularly when data originating in Europe or the United States is processed in other countries. They also highlight the often-hidden human role behind AI systems that are frequently presented as fully automated technologies.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
The US tech company Oracle has introduced a new AI platform to predict safety risks across construction projects.
A system called Advisor for Safety that aims to shift industry practices from reactive incident response to predictive risk prevention.
The AI model was trained using safety information equivalent to more than 10,000 project-years across multiple project types and locations.
By analysing historical patterns, the platform generates weekly forecasts that identify projects statistically most likely to experience safety incidents.
The solution also integrates structured safety observation tools through systems such as Oracle Aconex and Oracle Primavera Unifier, allowing field teams to collect consistent data on mobile devices or web platforms.
These inputs improve predictive accuracy while enabling organisations to track potential hazards earlier in the project lifecycle.
According to Oracle, the system combines data streams ranging from incident reports and payroll records to project schedules and operational metrics.
Early adopters reportedly reduced workplace incidents by up to 50 percent and workers’ compensation costs by as much as 75 percent during the first year of use.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
Meta is facing a new lawsuit in the US over privacy concerns tied to its AI smart glasses.
The legal complaint follows investigative reporting indicating that contractors working for a Kenya-based subcontractor reviewed footage captured by users’ devices, including sensitive personal scenes.
The lawsuit alleges that some of the reviewed material included nudity and other intimate activities recorded by the glasses’ cameras.
According to the complaint, the footage formed part of a data review process designed to improve the AI system integrated into the wearable device.
Plaintiffs claim Meta marketed the product as prioritising user privacy, citing advertisements suggesting that the glasses were ‘designed for privacy’ and that users remained in control of their personal data.
The complaint argues that such messaging could mislead consumers if the footage were subject to human review without clear disclosure.
A legal action that also names eyewear manufacturer Luxottica, which partnered with Meta to produce the glasses.
Meanwhile, the UK’s Information Commissioner’s Office has begun examining the issue after reports that face-blurring safeguards may not have consistently protected individuals captured in the recordings.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
A new round of the ORCA (Omni Research on Calculation in AI) benchmark reveals significant progress in how leading AI chatbots handle real-world mathematical problems, while also highlighting persistent limitations in reliability and consistency.
The latest results show Google’s Gemini 3 Flash moving clearly ahead of competing systems, correctly answering nearly three-quarters of the 500 practical questions used in the benchmark.
Our readers may recall that the platform previously analysed the first edition of the ORCA benchmark, examining how AI chatbots performed on everyday quantitative tasks rather than purely academic problems. The earlier analysis already showed notable gaps between systems and raised questions about the reliability of AI models for calculations people might encounter in daily life.
The second benchmark compares four widely accessible models: ChatGPT-5.2, Gemini 3 Flash, Grok-4.1 and DeepSeek V3.2. Gemini recorded the largest improvement, decisively outpacing the others. ChatGPT and DeepSeek posted smaller but steady gains, while Grok’s results declined slightly in several subject areas.
Gemini leads latest ORCA benchmark on AI maths accuracy 15Gemini leads latest ORCA benchmark on AI maths accuracy 16Gemini leads latest ORCA benchmark on AI maths accuracy 17Gemini leads latest ORCA benchmark on AI maths accuracy 18
Performance improvements were uneven across domains, with Gemini showing particularly strong gains in fields such as biology, chemistry, physics and health-related calculations.
Closer examination of the errors reveals why AI still struggles with mathematical accuracy. Calculation mistakes have increased as a share of total errors, while rounding and formatting problems have decreased.
Researchers explain that large language models do not actually compute numbers in the same way that calculators do. Instead, they predict likely sequences of words and numbers, which can lead to small shortcuts during multi-step reasoning that eventually produce incorrect results.
The benchmark also highlights another challenge: instability. The same question can produce different answers when asked multiple times, even when the model initially responded correctly. Such variation reflects the probabilistic nature of AI systems.
As a result, the benchmark concludes that AI chatbots can assist with calculations but cannot yet match the consistency of traditional calculators, which always return the same answer for the same input.
Would you like to learn more aboutAI, tech and digital diplomacy? If so, ask our Diplo chatbot!