OpenAI Launches Three Innovative Audio Models for Real-Time Voice Tasks

OpenAI Launches Three Innovative Audio Models for Real-Time Voice Tasks

Synopsis

OpenAI has unveiled three new audio models for developers. These models aim to make voice agents more interactive and capable of real-time task completion. GPT-Realtime-2 handles complex requests and interruptions. GPT-Realtime-Translate offers live translation across many languages. GPT-Realtime-Whisper provides instant speech-to-text for captions and notes. Companies like Zillow and Priceline are testing these advanced tools.
OpenAI introduced three audio models for its developer platform on Thursday, aiming to make voice-based software agents more conversational and capable of completing tasks in real time.

The launch ‌of the ⁠application ⁠programming interface (API) moves the ChatGPT-maker beyond transcription ​and chat toward agents that can listen, translate ​and act during live conversations.

The new models are GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper. OpenAI said they are available ⁠to test ‌in its developer playground.

GPT-Realtime-2 ​is ​designed to manage harder requests, ⁠call tools, handle interruptions and maintain context ​across longer voice sessions.

The second ​model supports translation from more than 70 languages into 13 output languages, targeting customer support, education and other settings.

GPT-Realtime-Whisper provides live speech-to-text, allowing ‌captions, meeting notes and workflow updates to be generated as a ​speaker talks.

Customers ​testing the ⁠models include online real estate marketplace Zillow, online travel agency Priceline and European telecommunications ​firm Deutsche Telekom.

Pricing for GPT-Realtime-2 starts at $32 per million audio input tokens, GPT-Realtime-Translate costs $0.034 per minute and GPT-Realtime-Whisper $0.017 per minute.

This editorial summary reflects ET Tech and other public reporting on OpenAI Launches Three Innovative Audio Models for Real-Time Voice Tasks.

Reviewed by WTGuru editorial team.