OpenAI Launches Three Innovative Audio Models for Real-Time Voice Tasks

OpenAI Launches Three Innovative Audio Models for Real-Time Voice Tasks

OpenAI has recently launched three audio models aimed at improving the functionality of voice-based software agents. These models are designed to enhance interactivity and enable real-time task completion during conversations.

The introduction of these models marks a significant advancement from basic transcription and chat capabilities to more sophisticated agents that can listen, translate, and act in real-time.

New Audio Models

The three new models include:

  • GPT-Realtime-2: This model is engineered to handle complex requests, manage interruptions, and maintain context throughout longer voice interactions.
  • GPT-Realtime-Translate: It offers live translation services, supporting over 70 languages and translating into 13 output languages, making it ideal for customer support and educational environments.
  • GPT-Realtime-Whisper: This model provides instant speech-to-text functionality, generating captions and notes as speakers talk.

Testing and Applications

Several companies, including Zillow, Priceline, and Deutsche Telekom, are currently testing these advanced tools to explore their potential applications in various industries.

Pricing Structure

The pricing for these models is as follows:

ModelCost
GPT-Realtime-2$32 per million audio input tokens
GPT-Realtime-Translate$0.034 per minute
GPT-Realtime-Whisper$0.017 per minute

Conclusion

These new audio models represent a leap forward in the capabilities of voice agents, enabling them to perform more complex tasks and engage in meaningful conversations. Organizations looking to enhance their customer interactions may find these tools particularly beneficial.

This editorial summary reflects ET Tech and other public reporting on OpenAI Launches Three Innovative Audio Models for Real-Time Voice Tasks.

Reviewed by WTGuru editorial team.