Self-host LLM in 10 minutes

Okay I might be exaggerating, but it's definitely much easier than what you think.

As someone in the software industry, I have heard about all the things about GPT, of course. But till today, I decide to approach LLMs (Large Language Model) from an engineer's perspective, rather than an end-user's.

Just like what I said in Journey of Self-Hosting (1): Inspirations, I am having a great passion of self-hosting software. I know I haven't written everything about my self-hosting setup yet, but that series will go on I promise!

Okay so, is it possible to self-host a LLM? Might be daunting at first glance, but it's totally possible (and much easier than you might think!)

Requirement

8GB+ memory would be great, because those models take a lot of memory.
OS: my VPS is on Ubuntu, but it should also work on Windows, macOS or Linux in general.
CPU: my VPS has ARM chip so arm64 works! Of course, Intel chips work too.
GPU: better if you have some, but it's totally okay if you have none!
Docker (optional): you can also just install locally, but below we will be covering the Docker way

Ollama

This is a tool that just makes running LLMs so easy. Basically, it can download an LLM to your disk, load it into memory, and start a prompt for you so that you can start typing to your model right away.

What you have to do is:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama run llama3

And there you have it!

You will be welcomed by a prompt, and you can type and start the conversation.

>>> Who are you?
I'm an artificial intelligence designed to provide information and answer questions. How may I assist you today?

Open WebUI

This is already very cool, but we can spice things up with much more usability (and styling!)

This is a Web UI that works seamlessly with ollama. I am sure you can tell this from its former name.

Compared to some other software that I have been trying to self-host recently, I can guarantee you this one is easy as breeze:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Access Open WebUI at http://localhost:3000

And done!

You will be able to interact with the LLM (that you just self-hosted!) in a much nicer user interface.

(GIF borrowed from their official repo)

Conclusion

Of course, I have been hand-wavy on some details here, such as how to host it in a VPS rather than localhost, how to enable NVIDIA power for either Ollama or Open WebUI. However, those differences won't make the entire process much different, as they are just specifying some arguments or env vars for your docker command.

So there you have it! Your very own self-hosted LLM. Enjoy!