According to a recent study, AI models have shown limitations in tackling high-level historical inquiries. Researchers tested three leading large language models (LLMs) — GPT-4, Llama, and Gemini — using a newly developed benchmark, Hist-LLM. The test, based on the Seshat Global History Databank, revealed disappointing results, with GPT-4 Turbo achieving only 46% accuracy, barely surpassing random guessing.
Researchers from Austria’s Complexity Science Hub presented the findings at the NeurIPS conference last month. Co-author Maria del Rio-Chanona highlighted that while LLMs excel at basic facts, they struggle with nuanced, PhD-level historical questions. Errors included incorrect claims about ancient Egypt’s military and armour development, often due to the models extrapolating from prominent but irrelevant data.
Biases in training data also emerged, with models underperforming on questions related to underrepresented regions like sub-Saharan Africa. Lead researcher Peter Turchin acknowledged these shortcomings but emphasised the potential of LLMs to support historians with future improvements.
Efforts are underway to refine the benchmark by incorporating more diverse data and crafting complex questions. Researchers remain optimistic about AI’s capacity to assist in historical research despite its current gaps.
Spain’s government has announced a new initiative to promote the adoption of AI technologies across the country’s businesses. Prime Minister Pedro Sanchez revealed on Monday that the government will provide an additional 150 million euros ($155 million) in subsidies aimed at supporting companies in their efforts to integrate AI into their operations.
The funding is designed to help businesses harness the potential of AI, which has become a critical driver of innovation and efficiency in various sectors, from manufacturing to healthcare and finance. The subsidies will be available to companies looking to develop or adopt AI-based solutions, to foster digital transformation and maintain Spain’s competitive edge in the global economy.
Sanchez emphasised that the funding will play a vital role in ensuring Spain remains at the forefront of the digital revolution, helping to build a robust, AI-powered economy. The move comes as part of Spain’s broader strategy to invest in technology and innovation, aiming to enhance productivity and create new opportunities for growth in both the public and private sectors.
Chinese AI company MiniMax has introduced three new models—MiniMax-Text-01, MiniMax-VL-01, and T2A-01-HD—designed to compete with leading systems developed by firms such as OpenAI and Google. Backed by Alibaba and Tencent, MiniMax has raised $850 million in funding and is valued at over $2.5 billion. The models include a text-only model, a multimodal model capable of processing text and images, and an audio generator capable of creating synthetic speech in multiple languages.
MiniMax-Text-01 boasts a 4-million-token context window, significantly larger than those of competing systems, allowing it to process extensive text inputs. Its performance rivals industry leaders like Google’s Gemini 2.0 Flash in benchmarks measuring problem-solving and comprehension skills. The multimodal MiniMax-VL-01 excels at image-text tasks but trails some competitors on specific evaluations. T2A-01-HD, the audio generator, delivers high-quality synthetic speech and can clone voices using just 10 seconds of recorded audio.
The models, mostly accessible via platforms like GitHub and Hugging Face, come with licensing restrictions that prevent their use in developing competing AI systems. MiniMax has faced controversies, including allegations of unauthorised use of copyrighted data for training and concerns about AI-generated content replicating logos and public figures. The releases coincide with new US restrictions on AI technology exports to China, potentially heightening challenges for Chinese AI firms aiming to compete globally.
The Pentagon is leveraging generative AI to accelerate critical defence operations, particularly the ‘kill chain’, a process of identifying, tracking, and neutralising threats. According to Dr Radha Plumb, the Pentagon’s Chief Digital and AI Officer, AI’s current role is limited to aiding planning and strategising phases, ensuring commanders can respond swiftly while maintaining human oversight over life-and-death decisions.
Major AI firms like OpenAI and Anthropic have softened their policies to collaborate with defence agencies, but only under strict ethical boundaries. These partnerships aim to balance innovation with responsibility, ensuring AI systems are not used to cause harm directly. Meta, Anthropic, and Cohere are tech giants working with defence contractors, providing tools that optimise operational planning without breaching ethical standards.
In the US, Dr Plumb emphasised that the Pentagon’s AI systems operate as part of human-machine collaboration, countering fears of fully autonomous weapons. Despite debates over AI’s role in defence, officials argue that working with the technology is vital to ensure its ethical application. Critics, however, continue to question the transparency and long-term implications of such alliances.
As AI becomes central to defence strategies, the Pentagon’s commitment to integrating ethical safeguards highlights the delicate balance between technological advancement and human control.
The Federal Trade Commission (FTC) has raised concerns about the competitive risks posed by collaborations between major technology companies and developers of generative AI tools. In a staff report issued Friday, the agency pointed to partnerships such as Microsoft’s investment in OpenAI and similar alliances involving Amazon, Google, and Anthropic as potentially harmful to market competition, according to TechCrunch.
FTC Chair Lina Khan warned that these collaborations could create barriers for smaller startups, limit access to crucial AI tools, and expose sensitive information. ‘These partnerships by big tech firms can create lock-in, deprive start-ups of key AI inputs, and reveal sensitive information that undermines fair competition,’ Khan stated.
The report specifically highlights the role of cloud service providers like Microsoft, Amazon, and Google, which provide essential resources such as computing power and technical expertise to AI developers. These arrangements could restrict smaller firms’ access to these critical resources, raise business switching costs, and allow cloud providers to gain unique insights into sensitive data, potentially stifling competition.
Microsoft defended its partnership with OpenAI, emphasising its benefits to the industry. ‘This collaboration has enabled one of the most successful AI startups in the world and spurred unprecedented technology investment and innovation,’ said Rima Alaily, Microsoft’s deputy general counsel. The FTC report underscores the need to address the broader implications of big tech’s growing dominance in generative AI.
Mistral, a Paris-based AI company, has entered a groundbreaking partnership with Agence France-Presse (AFP) to enhance the accuracy of its chatbot, Le Chat. The deal signals Mistral’s determination to broaden its scope beyond foundational model development.
Through the agreement, Le Chat will gain access to AFP’s extensive archive, which includes over 2,300 daily stories in six languages and records dating back to 1983. While the focus remains on text content, photos and videos are not part of the multi-year arrangement. By incorporating AFP’s multilingual and multicultural resources, Mistral aims to deliver more accurate and reliable responses tailored to business needs.
The partnership bolsters Mistral’s standing against AI leaders like OpenAI and Anthropic, who have also secured similar content agreements. Le Chat’s enhanced features align with Mistral’s broader strategy to develop user-friendly applications that rival popular tools such as ChatGPT and Claude.
Mistral’s co-founder and CEO, Arthur Mensch, emphasised the importance of the partnership, describing it as a step toward offering clients a unique and culturally diverse AI solution. The agreement reinforces Mistral’s commitment to innovation and its global relevance in the rapidly evolving AI landscape.
Nvidia has launched three new NIM microservices designed to help enterprises control and secure their AI agents. These services are part of Nvidia NeMo Guardrails, a collection of software tools aimed at improving AI applications. The new microservices focus on content safety, restricting conversations to approved topics, and preventing jailbreak attempts on AI agents.
The content safety service helps prevent AI agents from generating harmful or biased outputs, while the conversation filter ensures discussions remain on track. The third service works to block attempts to bypass AI software restrictions. Nvidia’s goal is to provide developers with more granular control over AI agent interactions, addressing gaps that could arise from broad, one-size-fits-all policies.
Enterprises are showing growing interest in AI agents, though adoption is slower than anticipated. A recent Deloitte report predicts that by 2027, half of enterprises will be using AI agents, with 25% already implementing or planning to do so by 2025. Despite widespread interest, the pace of adoption remains slower than the rapid development of AI technology.
Nvidia’s new tools are designed to make AI adoption more secure and reliable. The company hopes these innovations will encourage enterprises to integrate AI agents into their operations with greater confidence, but only time will tell whether this will be enough to accelerate widespread usage.
Apple has halted AI-powered notification summaries for news and entertainment apps after backlash over misleading news alerts. A BBC complaint followed a summary that misrepresented an article about a murder case involving UnitedHealthcare’s CEO.
The latest developer previews for iOS 18.3, iPadOS 18.3, and macOS Sequoia 15.3 disable notification summaries for such apps, with Apple planning to reintroduce them after improvements. Notification summaries will now appear in italics to help users distinguish them from standard alerts.
Users will also gain the ability to turn off notification summaries for individual apps directly from the Lock Screen. Apple will notify users in the Settings app that the feature remains in beta and may contain errors.
A public beta is expected next week, but the general release date for iOS 18.3 remains unclear. Apple had already announced plans to clarify that summary texts are generated by Apple Intelligence.
Hull College has embraced AI to enhance learning, from lesson planning to real-time language translation. The institution is hosting a conference at its Queens Gardens campus to discuss how AI is influencing teaching, learning, and career preparation.
Mature student Sharron Knight, retraining to become a police call handler, attended an AI seminar and described the technology as ‘not as scary’ as she initially thought. She expressed surprise at the vast possibilities it offers. Student Albara Tahir, whose first language is Sudanese, has also benefited from AI tools, using them to improve his English skills.
Hull College principal Debra Gray highlighted AI’s potential to empower educators. She compared the tool to a bicycle, helping both teachers and students reach their goals faster without altering the core learning process.
The UK government recently announced plans to expand AI’s role in public services and economic growth, including creating ‘AI Growth Zones’ to support job creation and infrastructure projects. AI is already being used in UK hospitals for cancer diagnostics and other critical tasks.
Beijing-based AI company Zhipu Huazhang Technology has opposed the US government’s plan to add it to the export control entity list. The company argues the decision lacks a factual basis.
Zhipu issued a statement on its official WeChat account expressing strong opposition to the move. The firm criticised the US commerce department’s intentions, insisting the decision was unjustified.
Zhipu and its subsidiaries face restrictions on accessing US technologies if added to the list. The company maintains it operates lawfully and transparently in its business practices.
The US has been increasing scrutiny on Chinese technology firms, citing national security concerns. Zhipu emphasised its commitment to responsible technology development and cooperation with global partners.