Running an LLM locally
Day 24 / 366
I few days ago I wrote about how tough it was to run LLMs locally, and that you needed a nice GPU to run it. Well, today I came across this video which proved how wrong I was.
Now I know Rasberry Pis are awesome. I have built a lot of projects on the Pi 3 and Pi 4, which had 4 GB of RAM. And while the Pi 5 is a big improvement, with 8 GB of RAM, I still couldn’t believe someone managed to run an LLM on it locally!
So I watched the video and found that he was using a tool called ollama, which allows you to run certain AI models locally. While GPU is preferred, even without it you can still run some models with just RAM. Apparently, it’s just a wrapper over llama.cpp, but it makes the whole process way easier.
So I decided to try it out on my Intel MacBook with 16 GB RAM.
The installation was pretty straightforward. Once that was done, I could choose from a list of 60+ LLM models to run. I chose the most popular one — llama2 by Meta
All I had to do to run it was open up a terminal and use the command
ollama run llama2
It took a few seconds to download the model, and then I was shown an input where I could enter my prompt.
This is amazing. You need to remember that this model is not fetching any information from the internet to give this answer, it’s doing it all locally! And while it was not as fast as ChatGPT, it was still pretty usable.
Why would anyone want to use this? Well there are several reasons
- There are loads of open-source models being developed each day, with a wide variety of features.
- Running an LLM this way means your data is not being shared with any other company.
- It’s free!
ollama also allows you to access the locally running LLM through an API. This means that you can create cool apps for your personal use.
I think this is pretty exciting, and with LLMs becoming smaller and our phones becoming computationally stronger, soon we will have LLMs running on our phones as well.