Gemini 1.5 Pro and the 1M long Context length
Day 51 / 366
So far Google has not been able to come anywhere close to OpenAI when it comes to releasing amazing AI products. Despite having more data than any other company, and more money, for some reason, Google just couldn’t deliver.
This was until the recently released Gemini 1.5 Pro, Google’s strongest AI model. Gemini 1.5 Pro is multimodal, it can work with text, videos, images, and audio. But the standout feature is that it has a context length of 1 Million tokens!
To understand more about LLMs and contexts, you can read my previous post —
In short, the context defines how much info an LLM can consume at any given time.
the latest version of GPT-4 by OpenAI has a context length of 128k tokens, which makes Gemini 1.5 Pro 8 times better than it in this regard.
But does it work? Or is Google just using a RAG-like mechanism behind the scenes? The access to Gemini Pro is limited so I have not been able to test it myself, but I found a guy on Twitter that did some extreme tests where it performed quite well.
For instance, Giving it the video file of a full movie and asking it to summarize it -
He did some haystack tests (finding a specific piece of information in a large amount of data) by giving the model an entire Harry Potter book and asking questions about it.
While this worked well, for even a slightly more complicated problem the model failed. One Twitter user suggested just adding a random ‘iPhone 15’ in the book text and then asking the model if there is anything in the book that seems out of place in the book. And the model failed to locate it.
The same was the case when the model was asked to summarize a 30-minute Mr. Beast video (over 300k tokens). It generated the summary but many people who had watched the video pointed out that the summary was mostly incorrect.
So while on paper this looked like a huge leap forward for Google, it seems that in practice it's not performing as well as they might have hoped.