OpenAI is enhancing its voice capabilities with the launch of the Advanced Voice Mode for ChatGPT Plus and Teams users.
This highly anticipated feature promises to transform user interactions with the chatbot into more natural, conversational experiences.
Powered by GPT-4o, OpenAI’s latest model, the voice mode integrates text, vision, and audio, resulting in faster and more fluid exchanges.
OpenAI announced via an official tweet:
"Advanced Voice is rolling out to all Plus and Team users in the ChatGPT app over the course of the week."
They also highlighted an amusing aspect of the feature, stating it can say “Sorry I’m late” in over 50 languages, a nod to the project's long development timeline.
A Step Toward Seamless Conversations
OpenAI confirmed that the advanced voice feature is now available for users of its premium service.
This innovation allows users to engage in more dynamic conversations, enhancing the overall interactive experience.
However, the rollout is not yet accessible to users in the EU, Iceland, Liechtenstein, Norway, Switzerland, or the U.K., creating a geographical divide in availability.
Originally announced in May, the new voice capability attracted considerable attention due to a voice option named Sky, which bore a striking resemblance to the voice of Scarlett Johansson in the 2013 film “Her.”
Following this revelation, legal representatives for Johansson sent letters to OpenAI, alleging that the company lacked the rights to use a voice so similar to hers.
Consequently, OpenAI halted the use of the voice in its products, as reported by CNBC.
A Richer Voice Experience
In the months following the initial announcement, users could interact with ChatGPT using various voices in a free tier.
The advanced version, however, significantly improves responsiveness, allowing it to pause and listen if interrupted mid-conversation.
Currently, users can choose from nine different voices and can customise their experience through the app’s settings.
“Hope you think it was worth the wait,” remarked Sam Altman, OpenAI’s co-founder and CEO, in a post on X, reflecting the anticipation surrounding this feature.
As competition intensifies, OpenAI finds itself in a rapidly evolving landscape of generative AI.
Google has recently launched its Gemini Live voice feature on Android devices, while Meta is expected to unveil celebrity voices accessible through its platforms, including Facebook and Instagram.
Navigating the New Feature
OpenAI’s Advanced Voice Mode is exclusively available to subscribers of its Plus, Team, or Enterprise plans, with the Plus tier starting at $20 per month.
To access this new feature, users need to ensure they have the latest version of the ChatGPT app installed on their devices.
Once access is granted, a notification will appear within the app, prompting users to proceed.
To initiate a voice chat, users can swipe right or tap the two-line icon in the app’s upper left corner to create a new chat.
A sound wave icon will appear next to the message text field and microphone icon, indicating that voice functionality is ready.
After tapping the icon, a short “bump” sound signals readiness, transforming the on-screen circle into a dynamic blue and white animation.
Users can begin speaking, and they should expect a prompt response.
OpenAI has made strides in improving accents across various foreign languages and enhancing conversation speed.
If users wish for a change in delivery, they can request modifications, such as asking ChatGPT to speed up its speech or adopt a Southern accent.
Limitations and Use Cases
The advanced voice mode allows ChatGPT to assist users in various tasks, from narrating bedtime stories to preparing for job interviews or practising foreign language skills.
However, users should be aware that even paying subscribers are subject to usage limits.
After approximately 30 minutes of interaction, a notification indicating “15 minutes left” appears at the bottom of the screen, raising questions about the extent of access to this feature.
As OpenAI continues to innovate and expand its capabilities, the introduction of Advanced Voice Mode signifies a crucial step in making AI interactions more engaging and lifelike.