GPT-4o, the next big thing?

2 min readMay 14, 2024

Day 135 / 366

It’s as if the universe saw my post yesterday and decided to make me look like an idiot. Just hours after I wrote the blog saying that AI innovations have slowed down, OpenAI announced their new model — GPT-4o. I guess the AI winter is not arrived just yet.

So what’s new with this model? Not that much to be honest.

OpenAI claims the following benefits with GPT-4o as compared to GPT-4 Turbo

It is 50% cheaper
It is twice as fast
It offers 5x bigger rate limits. Although this is not true right now but OpenAI claims that it will slowly ramp up the usage limits.

But the biggest talking point is that GPT-4o has text, speech, and image capabilities all built into it. ChatGPT already had a Text-To-Speech and a Speech-To-Text feature, but it was using a different model (Whisper, and an unnamed model) to achieve that. This model claims to handle audio, images, and even video natively, without using any other models.

The demo involved them talking to the model with a video on and asking it to describe what it sees. They also demonstrated how it can act like a live translator. To be honest it was nothing that we have not seen already. We cannot try out the video features yet as they have not released it to the general public yet, so I will take that demo with a pinch of salt.

I tried using the GPT-4o text-to-speech features but couldn’t as their servers were always overloaded. Their text model is performing well on the benchmarks and beating GPT-4. How well does that translate into real-world usage? I guess we will find that out in a couple of days.

GPT-4o, the next big thing?

Written by Pranav Tiwari

No responses yet