Podcast-based training helps improve AI dialogue

Researchers show spoken expert content can improve the depth and realism of AI responses.

AI learns from STEMM podcasts to boost reasoning

Researchers have developed PodGPT, a new AI model designed to enhance reasoning and dialogue skills by training on scientific podcasts. The project aims to integrate dynamic, conversational audio data into language models to boost their performance in STEMM subjects.

The team used over 3,700 hours of English-language STEMM podcast transcripts, alongside material from the New England Journal of Medicine. Transcripts were generated using Whisper large-v3 and fed into open-source AI models such as Gemma, Mixtral, and LLaMA.

PodGPT improves multilingual understanding and factual accuracy, particularly in answering science-based queries. It also performs better at retrieving evidence from long documents and engaging in human-like scientific dialogue.

The researchers suggest that podcast-based training provides more realistic language use and diverse reasoning patterns than traditional datasets. Their work demonstrates the value of spoken, expert-led content in preparing models for advanced scientific applications.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!