OpenAI has recently launched three audio models aimed at improving the functionality of voice-based software agents. These models are designed to enhance interactivity and enable real-time task completion during conversations.
The introduction of these models marks a significant advancement from basic transcription and chat capabilities to more sophisticated agents that can listen, translate, and act in real-time.
New Audio Models
The three new models include:
- GPT-Realtime-2: This model is engineered to handle complex requests, manage interruptions, and maintain context throughout longer voice interactions.
- GPT-Realtime-Translate: It offers live translation services, supporting over 70 languages and translating into 13 output languages, making it ideal for customer support and educational environments.
- GPT-Realtime-Whisper: This model provides instant speech-to-text functionality, generating captions and notes as speakers talk.
Testing and Applications
Several companies, including Zillow, Priceline, and Deutsche Telekom, are currently testing these advanced tools to explore their potential applications in various industries.
Pricing Structure
The pricing for these models is as follows:
| Model | Cost |
|---|---|
| GPT-Realtime-2 | $32 per million audio input tokens |
| GPT-Realtime-Translate | $0.034 per minute |
| GPT-Realtime-Whisper | $0.017 per minute |
Conclusion
These new audio models represent a leap forward in the capabilities of voice agents, enabling them to perform more complex tasks and engage in meaningful conversations. Organizations looking to enhance their customer interactions may find these tools particularly beneficial.