OpenAI’s Latest API Launches: DALL-E 3, Audio API, and Whisper large-v3
OpenAI has launched new APIs, including DALL-E 3 for text-to-image generation and an Audio API for text-to-speech conversion. Additionally, OpenAI has released the Whisper large-v3 automatic speech recognition model with improved performance across languages on GitHub.
OpenAI has recently announced the launch of several new APIs during its first-ever developer day event. The APIs include DALL-E 3, a text-to-image model, and the Audio API for text-to-speech conversion.
DALL-E 3, now accessible via an API, was first introduced on ChatGPT and Bing Chat. OpenAI has incorporated built-in moderation in the API to prevent misuse. The DALL-E 3 API offers different format and quality options for generating images, with pricing starting at $0.04 per generated image. However, it has certain limitations compared to its predecessor, DALL-E 2. For instance, the current version cannot be used to create edited versions or variations of existing images. OpenAI also automatically rewrites generation requests for safety reasons and to add more detail, which may result in less precise outcomes depending on the prompt.
OpenAI’s Audio API provides six preset voices and two generative AI model variants for text-to-speech conversion. With pricing starting at $0.015 per input of 1,000 characters, the API aims to enhance user experiences and enables use cases like language learning and voice assistance. Notably, unlike some speech synthesis platforms, OpenAI’s Audio API does not currently allow control over the emotional effect of the generated audio. OpenAI’s internal tests have yielded mixed results, with factors like capitalisation or grammar in the text influencing the sound of the generated voices.
To ensure transparency, OpenAI requires developers to use their APIs to inform users that the generated audio or image is artificial. This step aims to make users aware of the involvement of AI in the content.
In addition to these APIs, OpenAI has also launched Whisper large-v3, the next version of its open-source automatic speech recognition model. The company claims that Whisper large-v3 offers improved performance across languages. The model is available on GitHub under a permissive licence.