OpenAI, the artificial intelligence research laboratory, has recently unveiled a new version of its AI chatbot called ChatGPT-4o. The ‘o’ in the name stands for ‘omni,’ indicating the model’s ability to respond to and generate a combination of audio, text, and images.
A Closer Step to ‘Her’ Reality
The newly upgraded voice mode of ChatGPT-4o brings us closer to the reality depicted in the movie ‘Her,’ where the main character falls in love with an AI assistant voiced by Scarlett Johansson. The AI can now chat with human-like responsiveness and even read users’ emotions.
Mira Murati, OpenAI’s chief technology officer, emphasized the significance of this development, stating, “We’re looking at the future of interaction between ourselves and the machines.”
Impressive Capabilities Demonstrated
During the live-streamed launch event, OpenAI showcased some of the model’s impressive capabilities. The AI was asked to change the presentation of a bedtime story, use a robotic voice, and even sing. In one instance, a researcher asked the AI to analyze their facial expression and determine their emotional state. Despite a small technical issue, the voice assistant accurately responded, describing the researcher as “happy and cheerful with a big smile and maybe even a touch of excitement.”
Other videos posted by OpenAI demonstrate the new voice mode’s ability to teach math, harmonize with another voice assistant, and even assist a blind man in navigating London and hailing a cab.
Lightning-Fast Response Times
One of the key factors contributing to ChatGPT-4o’s life-like feel is its incredibly fast response times. The AI can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which falls within the range of normal human responses. This is a significant improvement over previous versions of ChatGPT, which used separate AI models for audio-to-text conversion, text processing, and text-to-speech generation, resulting in slower and less expressive responses.
Accessibility and Safety
OpenAI has made the new, enhanced model available to all ChatGPT users, including those without a paid subscription, as part of its mission to ensure AI technology is “accessible and beneficial to everyone.” However, the voice assistant capabilities will initially be launched in an alpha model in the coming weeks and will be available only to ChatGPT Plus subscribers during a wider rollout.
The company has emphasized that the new model has undergone extensive safety testing, involving independent experts, and has “safety built-in by design.” OpenAI acknowledges that the AI’s new voice capabilities create “a variety of novel risks” and notes that only preset voices will be available at launch to mitigate potential misuse, such as deepfake phishing scams or audio manipulation.
Timing and Industry Implications
The launch of ChatGPT-4o comes at a crucial time for OpenAI, as reports suggest the company may be nearing the end of talks with Apple to provide a new generation of AI-powered assistants. Given the limitations of Apple’s current AI assistant, Siri, a phone assistant with ChatGPT-4o’s capabilities is an attractive prospect.
The announcement also coincides with Google’s I/O event, where the tech giant is expected to unveil its own AI products and features. Although rumors of OpenAI releasing a search product to rival Google turned out to be false, the launch is still seen by many as a direct challenge to the more established competitor.
As AI technology continues to advance at a rapid pace, developments like ChatGPT-4o highlight the growing potential for more natural and intuitive interactions between humans and machines. While there are still concerns around the ethical use and potential misuse of such powerful tools, OpenAI’s commitment to accessibility and safety suggests a responsible approach to bringing this technology to the masses.
Information Sources: MSN and Daily Mail.