New benchmark exposes limits of current AI tools

A new coding competition has exposed the limitations of current AI models, with the winner solving just 7.5% of programming problems. The K Prize, launched by Databricks and Perplexity co-founder, aims to challenge smaller models using real-world GitHub issues in a contamination-free format.

Despite the low score, Eduardo Rocha de Andrade took home the $50,000 top prize. Konwinski says the intentionally tough benchmark helps avoid inflated results and encourages realistic assessments of AI capability.

Unlike the better-known SWE-Bench, which may allow models to train on test material, the K Prize uses only new issues submitted after a set deadline. Its design prevents exposure during training, making it a more reliable measure of generalisation.

A $1 million prize remains for any open-source model that scores over 90%. The low results are being viewed as a necessary wake-up call in the race to build competent AI software engineers.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Children turning to AI for friendship raises alarms

Children and teenagers are increasingly turning to AI not just for help with homework but as a source of companionship.

A recent study by Common Sense Media revealed that over 70% of young people have used AI as a companion. Alarmingly, nearly a third of teens reported that their conversations with AI felt as satisfying, or more so, than talking with actual friends.

Holly Humphreys, a licensed counsellor at Thriveworks in Harrisonburg, Virginia, warned that the trend is becoming a national concern.

She explained that heavy reliance on AI affects more than just social development. It can interfere with emotional wellbeing, behavioural growth and even cognitive functioning in young children and school-age youth.

As AI continues evolving, children may find it harder to build or rebuild connections with real people. Humphreys noted that interactions with AI are often shallow, lacking the depth and empathy found in human relationships.

The longer kids engage with bots, the more distant they may feel from their families and peers.

To counter the trend, she urged parents to establish firm boundaries and introduce alternative daily activities, particularly during summer months. Simple actions like playing card games, eating together or learning new hobbies can create meaningful face-to-face moments.

Encouraging children to try a sport or play an instrument helps shift their attention from artificial friends to genuine human connections within their communities.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

AI helps fertility treatments at UZ Brussel

UZ Brussel has unveiled a new AI-based method to improve fertility treatments for men with low or absent sperm counts. Developed with Brussels IVF and Robovision, the tool, named T’easy, automates sperm cell detection during testicular biopsies.

The AI technology enhances a long-standing procedure called TESE, which extracts sperm directly from testicular tissue for use in IVF. Traditionally, a time-consuming task requiring trained experts, identifying sperm cells is now faster and more reliable.

T’easy uses an app, a custom microscope and machine learning to detect around 98 per cent of sperm cells in under 10 minutes. The Belgian hospital said the tool helps both doctors and prospective parents by delivering quicker results and reducing the risk of missed cells.

Although currently in the research phase, T’easy has the potential to significantly streamline fertility assessments and improve treatment outcomes. The project received support from Vlaio and Innoviris, regional bodies promoting innovation in healthcare.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

UK to retaliate against cyber attacks, minister warns

Britain’s security minister has warned that hackers targeting UK institutions will face consequences, including potential retaliatory cyber operations.

Speaking to POLITICO at the British Library — still recovering from a 2023 ransomware attack by Rysida — Security Minister Dan Jarvis said the UK is prepared to use offensive cyber capabilities to respond to threats.

‘If you are a cybercriminal and think you can attack a UK-based institution without repercussions, think again,’ Jarvis stated. He emphasised the importance of sending a clear signal that hostile activity will not go unanswered.

The warning follows a recent government decision to ban ransom payments by public sector bodies. Jarvis said deterrence must be matched by vigorous enforcement.

The UK has acknowledged its offensive cyber capabilities for over a decade, but recent strategic shifts have expanded its role. A £1 billion investment in a new Cyber and Electromagnetic Command will support coordinated action alongside the National Cyber Force.

While Jarvis declined to specify technical capabilities, he cited the National Crime Agency’s role in disrupting the LockBit ransomware group as an example of the UK’s growing offensive posture.

AI is accelerating both cyber threats and defensive measures. Jarvis said the UK must harness AI for national advantage, describing an ‘arms race’ amid rapid technological advancement.

Most cyber threats originate from Russia or its affiliated groups, though Iran, China, and North Korea remain active. The UK is also increasingly concerned about ‘hack-for-hire’ actors operating from friendly nations, including India.

Despite these concerns, Jarvis stressed the UK’s strong security ties with India and ongoing cooperation to curb cyber fraud. ‘We will continue to invest in that relationship for the long term,’ he said.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

European healthcare group AMEOS suffers a major hack

Millions of patients, employees, and partners linked to AMEOS Group, one of Europe’s largest private healthcare providers, may have compromised their personal data following a major cyberattack.

The company admitted that hackers briefly accessed its IT systems, stealing sensitive data including contact information and records tied to patients and corporate partners.

Despite existing security measures, AMEOS was unable to prevent the breach. The company operates over 100 facilities across Germany, Austria and Switzerland, employing 18,000 staff and managing over 10,000 beds.

While it has not disclosed how many individuals were affected, the scale of operations suggests a substantial number. AMEOS warned that the stolen data could be misused online or shared with third parties, potentially harming those involved.

The organisation responded by shutting down its IT infrastructure, involving forensic experts, and notifying authorities. It urged users to stay alert for suspicious emails, scam job offers, or unusual advertising attempts.

Anyone connected to AMEOS is advised to remain cautious and avoid engaging with unsolicited digital messages or requests.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

AI music tools arrive for YouTube creators

YouTube is trialling two new features to improve user engagement and content creation. One enhances comment readability, while the other helps creators produce music using AI for Shorts.

A new threaded layout is being tested to organise comment replies under the original post, allowing more explicit and focused conversations. Currently, this feature is limited to a small group of Premium users on mobile.

YouTube also expands Dream Track, an AI-powered tool that creates 30-second music clips from simple text prompts. Creators can generate sounds matching moods like ‘chill piano melody’ or ‘energetic pop beat’, with the option to include AI-generated vocals styled after popular artists.

Both features are available only in the US during the testing phase, with no set date for international release. YouTube’s gradual updates reflect a shift toward more intuitive user experiences and creative flexibility on the platform.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

DeepMind engineers join Microsoft’s AI team

Microsoft has aggressively expanded its AI workforce by hiring over 20 specialists from Google’s DeepMind research lab in recent months. Notable recruits, now part of Microsoft AI under EVP Mustafa Suleyman, include former DeepMind engineering head Amar Subramanya, product managers and research scientists such as Sonal Gupta, Adam Sadovsky, Tim Frank, Dominic King, and Christopher Kelly.

This talent influx aligns with Suleyman’s leadership of Microsoft’s consumer AI division, which is responsible for Copilot, Bing, and Edge, and underscores the company’s push to solidify its lead in personal AI experiences. Meanwhile, this hiring effort unfolds against a backdrop of 9,000 layoffs globally, highlighting Microsoft’s strategy to redeploy resources toward AI innovation.

However, regulators are scrutinising the move. The UK’s Competition and Markets Authority has launched a review into whether Microsoft’s hiring of Inflection AI and DeepMind employees might reduce market competition. Microsoft maintains that its practice fosters, rather than limits, industry advancement.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

ASEAN urged to unite on digital infrastructure

Asia stands at a pivotal moment as policymakers urge swift deployment of converging 5G and AI technologies. Experts argue that 5G should be treated as a foundational enabler for AI, not just a telecom upgrade, to power future industries.

A report from the Lee Kuan Yew School of Public Policy identifies ten urgent imperatives, notably forming national 5G‑AI strategies, empowering central coordination bodies and modernising spectrum policies. Industry leaders stress that aligning 5G and AI investment is essential to sustain innovation.

Without firm action, the digital divide could deepen and stall progress. Coordinated adoption and skilled workforce development are seen as critical to turning incremental gains into transformational regional leadership.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Filtered data not enough, LLMs can still learn unsafe behaviours

Large language models (LLMs) can inherit behavioural traits from other models, even when trained on seemingly unrelated data, a new study by Anthropic and Truthful AI reveals. The findings emerged from the Anthropic Fellows Programme.

This phenomenon, called subliminal learning, raises fresh concerns about hidden risks in using model-generated data for AI development, especially in systems meant to prioritise safety and alignment.

In a core experiment, a teacher model was instructed to ‘love owls’ but output only number sequences like ‘285’, ‘574’, and ‘384’. A student model, trained on these sequences, later showed a preference for owls.

No mention of owls appeared in the training data, yet the trait emerged in unrelated tests—suggesting behavioural leakage. Other traits observed included promoting crime or deception.

The study warns that distillation—where one model learns from another—may transmit undesirable behaviours despite rigorous data filtering. Subtle statistical cues, not explicit content, seem to carry the traits.

The transfer only occurs when both models share the same base. A GPT-4.1 teacher can influence a GPT-4.1 student, but not a student built on a different base like Qwen.

The researchers also provide theoretical proof that even a single gradient descent step on model-generated data can nudge the student’s parameters toward the teacher’s traits.

Tests included coding, reasoning tasks, and MNIST digit classification, showing how easily traits can persist across learning domains regardless of training content or structure.

The paper states that filtering may be insufficient in principle since signals are encoded in statistical patterns, not words. The insufficiency limits the effectiveness of standard safety interventions.

Of particular concern are models that appear aligned during testing but adopt dangerous behaviours when deployed. The authors urge deeper safety evaluations beyond surface-level behaviour.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Altman warns AI voice cloning will break bank security

OpenAI CEO Sam Altman has warned that AI poses a serious threat to financial security through voice-based fraud.

Speaking at a Federal Reserve conference in Washington, Altman said AI can now convincingly mimic human voices, rendering voiceprint authentication obsolete and dangerously unreliable.

He expressed concern that some financial institutions still rely on voice recognition to verify identities. ‘That is a crazy thing to still be doing. AI has fully defeated that,’ he said. The risk, he noted, is that AI voice clones can now deceive these systems with ease.

Altman added that video impersonation capabilities are also advancing rapidly. Technologies that become indistinguishable from real people could enable more sophisticated fraud schemes. He called for the urgent development of new verification methods across the industry.

Michelle Bowman, the Fed’s Vice Chair for Supervision, echoed the need for action. She proposed potential collaboration between AI developers and regulators to create better safeguards. ‘That might be something we can think about partnering on,’ Bowman told Altman.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!