Running an LLM on Google Chrome
Day 185 / 366
I have been trying to write this blog for the past week. Ever since I found out that Google Chrome is soon going to ship with Gemini Nano, google’s own LLM. In short, this would mean that you can create websites with GenAI capabilities that can run right on the user’s browsers. No need to call any APIs or pay any token fees.
I saw a post on Twitter where someone had posted a screenshot of him trying this out on a dev version of Chrome. And ever since then, I had been trying to set this up on my laptop as well so that I can review it. But I failed consistently, until today.
How to get access
You need to fill this form in order to get access to this feature —
It usually takes 24 hours to get the access. Once you are done, you will need to install the dev version of google chrome from this link —
Setting up dev Google Chrome
Once you have downloaded Google Chrome for developers, you need to turn on a few flags in order to enable the on-device LLM feature.
- Open a new tab in Chrome, go to chrome://flags/#optimization-guide-on-device-model
- Select Enabled BypassPerfRequirement
- This bypasses performance checks which might get in the way of having Gemini Nano downloaded on your device.
- Go to chrome://flags/#prompt-api-for-gemini-nano
- Select Enabled
- Relaunch Chrome.
Once you are done with this, you need to download the Gemini Nano model
- Go to chrome://components/
- Search for Optimization Guide On Device Model and click on check for update. This will start the model download.
- You might need to refresh the page a few times to see the effect.
After some time, depending on how fast your internet is, the model should be ready to use
Testing it out
To test this, just open a new tab and then open dev tools. In the console then type the following code -
await window.ai.canCreateTextSession();
If you see the output as “readily”, you are good to go!
Now this is how you ask the LLM a question
const session = await window.ai.createTextSession();
const stream = session.promptStreaming("Write me an extra-long poem");
for await (const chunk of stream) {
console.log(chunk);
}
That is it. You just got a locally running AI model on your browser!