26 Sep 2023

OpenAI’s ChatGPT gets more human-like with new voice and image features

Five different voices are available to users to choose from to personalize the chatbot. ChatGPT can also process images and answer questions based on them. Over the next two weeks, the new functionalities will be first available to Plus and Enterprise paying members, with other groups to follow later.

ChatGPT, OpenAI’s popular chatbot, has received a significant update with the ability to ‘see, hear, and speak.’ Users can now use their microphones to ask questions and include images in their inquiries.
The voice functionality is based on a new text-to-speech system that transcribes speech into text using OpenAI’s open-source speech recognition technology, Whisper. Five different voices are available to users to choose from to personalize the chatbot. ChatGPT can also process images and answer questions based on them. Over the next two weeks, the new functionalities will be first available to Plus and Enterprise paying members, with other groups to follow later.

Why does it matter?

Since its launch in November 2022, the viral chatbot has been used by organizations for a broad range of applications ranging from content creation to computer code development, sparking a generative AI race among tech giants to build their own products. Last week, Google unveiled updates to its own chatbot, Bard, and Amazon announced it was upgrading its Alexa voice assistant with generative AI capabilities.
These updates make ChatGPT more versatile, accessible, and human-like and improve its ability to interact with users in a more personalized way. ChatGPT can now understand spoken words and respond with a synthetic voice, moving OpenAI’s chatbot closer to popular AI assistants like Apple’s Siri and Google’s Alexa. Users can now ask questions using their microphones and submit images as part of their questions. ChatGPT’s updates expand its range of capabilities, with an increased potential to disrupt various industries.