AI - Run Your Own Large Language Model (LLM) on Your Local Machine

Normally I do a single post a week, but I thought I'd do a second this week, to cover running a large language model (LLM) on your own kit.

You may have read my last post about building your own chatGPT and be thinking to yourself, that was a bit of a con. After all, we haven't really built a bespoke chatGPT. What we have done is not much more than putting a wrapper on the existing chatGPT. We have used a web interface to introduce a new layer that helps tweak a user prompt to turn it into a more specific prompt that will hopefully get us better results.

Firstly, to be able to do that with just a few minutes work is pretty amazing. To be able to tweak chatGPT so easily should not be taken for granted. It is incredible if you stop and think about it. If you don't think it is incredible, give your head a shake, you're taking too much of this shit for granted.

I mentioned that there are a couple of ways of building your own chatGPT, the one I demonstrated in the previous post, https://drbry.hashnode.dev/ai-prompts-building-your-own-chatgpt, and another using another tool within OpenAI. In the second case, you can create assistants, that you interact with using an API. This gives you even more freedom than the earlier version and I will do a post on this in the coming weeks.

The reality is that you cannot build an LLM from scratch unless you have a shit load of money and I am talking 10s of millions of dollars. This is not an option for anyone reading this.

For me, when I think about my own chatGPT, I am thinking about an LLM that I can run on my machine or intranet. Something completely detached from chatGPT, Gemini, Anthropic, CoPilot, or any other online offering. Something I can still use when chatGPT goes mental, like it did this morning. How difficult is it to get an LLM running on my or your local machine?

Again, this turns out to be easy, though as always in developing there are some caveats and trade-offs. But, the reward is that you have one of these LLMs running locally that is controlled by you and your team. Can we do that with ease, for real? The answer is yes, although the more control and flexibility you require, the more work you have to do. Though to be fair, it is not that much work.

Ok, let's start with the lowest amount of work you need to do to have a LLM running on your machine. There are two easy solutions.

The first I am going to mention is from NVIDIA, Chat with RTX, here is the link, https://www.nvidia.com/en-gb/ai-on-rtx/chat-with-rtx-generative-ai/ . Download and run the executable. Takes a little while to run the installation depending on your bandwidth, but then you are good to go. There are system requirements and if you don't surpass these then the executable will stop installing. Spoiler alert, you need 16GB RAM and a RTX 30 or 40 series GPU with at least 8GB of RAM. I'll do a write-up on this another day, it is not perfect, but it is running on your machine. For those interested the default LLM is "Mistral 7B int4", more on this another day.

The second method for running LLMs on your own machine is a product called LM Studio, https://lmstudio.ai/ , again, a simple exe that you download and run. Once you run LM Studio you have the option to search through and download one, or more, of the many thousands of LLMs running on Huggingface. Don't worry, I'll cover Huggingface another day also. I won't go into the details, at this point, of the functionality available with LM Studio, other than to say, that you are running the LLM on your local machine.

To run an LLM locally you do need a reasonable amount of power, you can't run this on some old server box you've had sitting around for the last 10 years. You want a minimum of 32GB RAM, although 64, or even 128 GB is better, particularly for the larger LLMs. Same with the graphics card the bigger the better, a RTX 4080 with 16GB would be handy, although you can make do with an RTX 3060 with 12GB, which will cost you about £300. The more RAM and the better the GPU card the faster the LLM runs. Any high-end laptop will also suffice.

That is all you need to do though. If your machine has the power, you can run the "Chat with RTX" and "LM Studio" at the same time. "Chat" comes with a web interface and you have the option to do RAG on your local documents, I'll cover this in a separate post. "LM Studio" comes with a chatbot and a bunch of options that I will save for another post.

Both of these methods result in an LLM that is running on your machine and is completely controlled by you.

The LLMs you download are all open-source and are better than you would think. I have done some work using, Mistral (mistral), Llama (from Meta), and Phi-2 (Microsoft) and they are surprisingly good. What is weird though is that with a small amount of work, and it is a small amount of work, you can get one of these open-source LLMs to outperform chatGPT 4, in a specific area. More on this in another post.

So, if you were feeling a little cheated with the post about "building your own chatGPT", in this post we have covered how to run an LLM locally. For the techies out there reading this, if you have a machine, download the exes and have a play. For anyone else reading this, get your techies a machine capable of running LLMs and get them to download the exes and have a play.

Before anyone gets stressed, there are other products similar to the two I have mentioned here, but these are two that I have played with so I can talk about them with a little confidence. Feel free to check out other solutions for yourself.

That is enough for today and this week.

Bryan
ps, this is human generate content, though I do use Grammarly to check my spelling which is often piss poor. Also, if anyone has 2, or better still 4 x RTX 4090 24GB graphics cards they have no further use for, send them over.

AI - running your own chatGPT