A San Francisco start-up, named Conduit, has spent six months building what it claims is the largest neural language dataset ever assembled, capturing around 10,000 hours of non-invasive brain recordings from thousands of participants.
The project aims to train thought-to-text AI systems that interpret semantic intent from brain activity moments before speech or typing occurs.
Participants take part in extended conversational sessions instead of rigid laboratory tasks, interacting freely with large language models through speech or simplified keyboards.
Engineers found that natural dialogue produced higher quality data, allowing tighter alignment between neural signals, audio and text while increasing overall language output per session.
Conduit developed its own sensing hardware after finding no commercial system capable of supporting large-scale multimodal recording.
Custom headsets combine multiple neural sensing techniques within dense training rigs, while future inference devices will be simplified once model behaviour becomes clearer.
Power systems and data pipelines were repeatedly redesigned to balance signal clarity with scalability, leading to improved generalisation across users and environments.
As data volume increased, operational costs fell through automation and real time quality control, allowing continuous collection across long daily schedules.
With data gathering largely complete, the focus has shifted toward model training, raising new questions about the future of neural interfaces, AI-mediated communication and cognitive privacy.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
