OpenAI launches GPTo, improving ChatGPT’s text, visual and audio capabilities

May 14, 2024

OpenAI has updated its artificial intelligence model to make it sound more like a human when it talks. This new feature can even try to figure out people’s moods based on their tone and expressions. It’s reminiscent of the 2013 movie “Her,” where a man falls in love with an AI operating system. Although OpenAI’s new model isn’t designed for romance, it does have some impressive capabilities.

The new AI model, called GPT-4o (short for “omni”), can work faster than older versions and handle text, audio, and video all at once. This model will power OpenAI’s popular ChatGPT chatbot, and everyone, even those using the free version, will get access to it soon. OpenAI announced this during a short live-streamed event. CEO Sam Altman hinted at the model’s abilities by posting the word “her” on social media, referencing the movie.

During the demonstration, OpenAI’s Chief Technology Officer Mira Murati and other executives showed off what the new AI can do. The AI bot talked in real-time and could add emotion to its voice when asked, making it sound more dramatic. It also helped solve a simple math problem by guiding through the steps without just giving the answer. Additionally, it handled a complex coding problem, demonstrating its versatility.

One of the coolest features is that the AI can guess a person’s mood by looking at a selfie video. For example, it decided a person was happy because he was smiling. It also translated conversations between English and Italian, showing how it can help people who speak different languages communicate.

Chirag Dekate, an analyst from Gartner, commented on the update, saying it felt like OpenAI was trying to keep up with bigger competitors. He mentioned that many of the features shown were similar to those already demonstrated by Google with its AI model, Gemini. Although OpenAI was ahead last year with ChatGPT and GPT-3, Google seems to have taken the lead now.

Google is also preparing for its I/O developer conference, where it plans to showcase updates to its Gemini AI model. This event is expected to reveal even more advanced AI capabilities.

Overall, OpenAI’s new GPT-4o model is a significant step forward. It can understand and respond with more human-like speech and even detect emotions. It’s also versatile enough to handle tasks like solving math problems and translating languages. However, the competition in the AI field is fierce, and companies like Google are pushing the boundaries even further.

The most striking feature of OpenAI’s new model is its ability to mimic human speech patterns. This means that the AI can adjust its tone, pace, and even emotional expression to sound more like a real person. During the demonstration, the AI bot was able to add drama to its voice on command. This makes interactions with the AI feel more natural and engaging, which is a big step forward in making AI assistants more user-friendly.

Another impressive feature is the AI’s ability to detect moods. By analyzing facial expressions in a selfie video, the AI can guess how someone is feeling. For example, it identified that a person was happy because they were smiling. This ability to read emotions could make AI interactions more personalized and empathetic.

The AI’s problem-solving skills were also highlighted. Instead of just giving answers, the AI guided users through the steps needed to solve problems. This was demonstrated with a simple math equation, where the AI explained the process without just spitting out the answer. This approach can help users learn and understand better.

In addition to basic tasks, the AI showed it could handle complex problems, like assisting with software coding. This feature can be incredibly useful for developers who need help debugging or writing code. The AI can offer real-time assistance, making the coding process smoother and more efficient.

The AI also demonstrated its ability to translate languages. It was able to translate English and Italian, showing how it can facilitate conversations between people who speak different languages. This feature can be beneficial in many scenarios, from travel to international business meetings.

Despite these advancements, some analysts believe OpenAI is playing catch-up with its competitors. Google, for example, has already showcased advanced AI capabilities with its Gemini model. OpenAI had a head start with its ChatGPT and GPT-3 models, but now it seems Google might be leading the way. This competition drives innovation, pushing companies to continually improve their AI technologies.

Google’s upcoming I/O developer conference is highly anticipated. The company is expected to unveil updates to its Gemini AI model, potentially introducing even more advanced features. This ongoing competition between AI companies is beneficial for users, as it leads to more powerful and versatile AI tools.

The advancements in AI, as demonstrated by OpenAI’s new model, have significant implications. AI that can understand and mimic human speech, detect emotions, and solve complex problems can be used in various fields. Customer service, education, healthcare, and software development are just a few areas that could benefit from these technologies.

OpenAI’s latest update to its AI model marks a significant step forward. With its ability to mimic human speech, detect moods, and solve problems in real-time, the AI is becoming more sophisticated and useful. However, the competition in the AI industry is fierce, and companies like Google are pushing the boundaries even further. As these technologies continue to evolve, they will become increasingly integrated into our daily lives, offering new ways to interact with and benefit from AI.