Segment Anything adds audio as Meta unveils SAM Audio

SAM Audio by Meta simplifies isolating voices, instruments and background noise using AI.

Meta's SAM Audio brings text-based prompts to sound separation across audio and video.

Meta has introduced SAM Audio, a new AI model that uses intuitive prompts to isolate and segment sounds from complex audio recordings. The release extends the company’s Segment Anything collection beyond visuals into audio and video workflows.

SAM Audio allows users to separate sounds through text prompts, visual cues, or time-based selections. Creators can extract vocals or instruments, remove background noise, or isolate specific sound sources in recordings without specialised audio engineering tools.

Meta describes SAM Audio as a unified model designed around how people naturally think about sound. It supports combined text, visual, and time-based prompts, enabling flexible audio separation across music, podcasting, film, accessibility, and research.

Meta says the model achieves strong performance across diverse audio environments and is already being used internally to develop next-generation creative tools. The approach lowers technical barriers while expanding the range of possible audio editing applications.

SAM Audio is available through the Segment Anything Playground, where users can test the model with sample assets or upload their own files. Meta has also made the model available for download, signalling broader ambitions to make audio segmentation a core capability of its AI ecosystem.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!