Marvin, AI and tiny llamas

If you haven't read the Hitchhikers guide then this post is going to make no sense to you whatsoever. If you don't know, Marvin, the Paranoid Android from Douglas Adams' "The Hitchhiker's Guide to the Galaxy," is a character known for his distinct brand of gloomy, deadpan humour, he is a miserable bugger, to put it bluntly. Part of the reason he is miserable is because he has a "brain the size of a planet," Marvin's intelligence is vastly superior to everyone around him, a fact that contributes to his perennial state of boredom and depression. He is also the ultimate LLM, LLM++ if you like.

Marvin, despite his brain the size of a planet, often found himself performing menial tasks. This got me wondering, are we treating chatGPT and similar-sized LLMs a little like Marvin? Are we using LLMs to perform menial tasks that could be performed by a far smaller LLM? Not all tasks require the power of colossal LLMs. For specific, routine applications, smaller LLMs are not only more efficient but also practical. They consume less computational resources, making them more accessible and cost-effective.

The beauty of smaller LLMs lies in their ability to be finely-tuned for specific tasks. Unlike larger models that are generalized, smaller LLMs can be customized to excel in particular domains. This specialization ensures higher accuracy and relevancy in outputs. Imagine asking Marvin to solve a simple arithmetic problem. His response would likely highlight the overkill of using his vast intellect for such a trivial task. Similarly, deploying large LLMs for straightforward tasks is often unnecessary. It's like using a sledgehammer to crack a nut - a waste of potential, resources, and money.

Training large LLMs is a resource-intensive and time-consuming process. Not only that but it costs a fortune. Training a large LLM costs millions. Can smaller models be trained and updated much more quickly? I don't know, but I would assume so. Even so, we already have cases where large LLMs have been reduced using techniques such as QLoRA that produce LLMs that are less than 1GB in size. I can run these on a regular PC at home on a 3060 GPU with no problem.

Not all organizations have the luxury of accessing the computational power needed to run large LLMs, the graphics cards alone can cost a small fortune. So I suppose that smaller models democratize AI in a way, making advanced technology accessible to smaller businesses and startups. It has given me LLMs that I can run on my local machines. With lower computational demands, these models are more sustainable, aligning with the growing need for environmentally friendly technology solutions (I read that somewhere). They also allow me to have a mess around and see what you can do for very little cost.

There are some truly amazing LLMs out there besides the well-known "heavyweights". Phi 2 comes in at about 5.6GB, but there is a quantized version that is only 1.6GB, you can run it on a modest GPU. There are several versions of TinyLlama out there that are under 1GB. These are not as good as chatGPT, but they are still amazing.

Now you may be wondering if I am going anywhere with this post and the truth is that I am, sort of. I spent a lot of years working with a product called SharePoint, a content management system from Microsoft and I got to thinking about how I would use LLMs with SharePoint on an internal intranet, not one based in the cloud. SharePoint has a whole lot of security features for restricting access to data. This means that we can't simply build an LLM that accesses all this data. If we did that, there would be security leaks all over the place. With the right prompt, you could find yourself staring at highly restricted HR information for example.

For a system using SharePoint, or a legal firm employing a Chinese Wall (I did a post on that last year) I reckon we could use fine-tuning, or RAG and a bunch of smaller LLMs to produce a solution that would be incredibly powerful. The solution would run on relatively modest hardware and would produce better results than any large OOTB LLM currently out there. I think this is the immediate future of LLMs for many firms in the coming year or two.

For $20 a month per head, we can have access to Marvin, but we don't need it for most of what we want to do. I have an old machine that I run VMs on and I have installed SharePoint on it, I've been working my way through the DeepLearning courses, which are great by the way, and I am going to have a go at using Chroma and doing some RAG using the content I have stored in SharePoint. The aim was to produce an AI tutor/mentor that teaches physics. Then yesterday (this has been edited since I first published it) I had a bit of a coding break through and have decided on something totally different and it has the potential to be way cooler, more on this in the next post.

The solution could also be used in numerous other situations, including industry, so if and when, I do get it working I'll tell you about it and put my code out there for people to use, it is sweet. I managed to create my first vector database from data stored in SharePoint which pleased me more than it probably should have. Don't know if it is any good yet, I'll be testing it later this week. That is probably enough for now so I shall wrap it up.

That is the first post of 24, hopefully not the last!

Bryan.
This is completely written by a human, me, so if it is shit, don't be blaming the AI.

Marvin and Small Tuned LLMs