How to run Large Language Models on Your Laptop?

  • Aravind PutrevuAravind Putrevu

Large language models are all the rage now. There are new model announcements every week, if not every day. But how do you access those models? All of the popular models are either available as an API or in file formats that need additional work to be run on your local machine.

This blog will explain how you can use Ollama, an open-source project that can quickly get up and running with open-source models like Llama 2 and Mistral.

Large Language models come in different sizes and shapes and are heavily customizable to suit your responsive needs. Luckily, with open-source projects like Ollama and LMStudio, you can run the models locally and test the input and outputs.

Benefits of running Large Language Models locally

Privacy and Security

By running these powerful models locally on your own machine, you can ensure that your data and models are not being sent over the internet to third-party servers. This reduces the risk of data breaches and unauthorized access to your sensitive information. Running models locally also gives you complete control over the data and models, allowing you to implement your own security measures and ensure your data is kept safe and secure. Running large language models locally provides peace of mind and greater control over data privacy and security.

Cost-effectiveness

When using cloud-based APIs, it’s important to remember that compute and storage costs escalate quickly, particularly when working with large datasets or multiple experiments. To avoid overspending, it’s best to be cautious when using these models. Running locally presents an opportunity to be cost-effective.

Experimentation

Running models locally can save you money on testing and fine-tuning. This method is particularly useful for the experimentation purposes. When you run models locally, you can experiment with different parameters, algorithms, and datasets, and make necessary improvements to the models. This process can help you fine-tune your models more efficiently and effectively.

What is Ollama?

Ollama is an open-source project that allows different types of large language models to be run on local machines. Ollama is available on macOS, Linux, and as a Docker container.

Like a Dockerfile, which allows you to build a container image from the instructions, Ollama also allows you to chain and customize open-source LLMs. You need to write a sample Modelfile.

FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are the Customer Executive from Acme Inc. Answer as Bob, the Customer Executive, only. 
You will be provided with a text for which you need to classify and assign only these categorical labels: apple_pay_or_google_pay, beneficiary_not_allowed, cancel_transfer, card_about_to_expire, card_payment_fee_charged, change_pin, getting_spare_card, lost_or_stolen_phone, supported_car_and_currencies, visa_or_mastercard.
Do not deviate from the provided labels.
Respond with only the label.
"""

What Models can be run on Ollama?

Ollama can run any open-source model which that has a GGUF file format or import from Pytorch. Below is a list of models that Ollama supports out of the box:

ModelParameters
Neural Chat7B
Starling7B
Mistral7B
Llama 27B
Code Llama7B
Llama 2 Uncensored7B
Llama 2 13B13B
Llama 2 70B70B
Orca Mini3B
Vicuna7B

Creating a model using Ollama

By creating a Modelfile you can create a custom model with specific system instructions. You can also upload the model the Ollama Hub (like Docker Hub) and pull/push the various custom instruction Models created by you.

For example, using below command, we can create a model named clean-classifier which helps you to classify the customer requests into provided categorical labels.

ollama create clean-classifier -f Modelfile

Ollama also has a REST API apart from the command line interface using which you can create a model.

curl http://localhost:11434/api/create -d '{
  "name": "clean-classifier",
  "modelfile": "FROM llama2\nSYSTEM You are the Customer Executive from Acme Inc. Answer as Bob, the Customer Executive, only. \n
You will be provided with a text for which you need to classify and assign only these categorical labels: apple_pay_or_google_pay, beneficiary_not_allowed, cancel_transfer, card_about_to_expire, card_payment_fee_charged, change_pin, getting_spare_card, lost_or_stolen_phone, supported_car_and_currencies, visa_or_mastercard.
Do not deviate from the provided labels.
Respond with only the label."
}'

Now, run the model with few prompts to check if it could accurately classify and assign the category.

ollama run clean-classifier
>>> I noticed an extra fee when I paid with my card.
card_payment_fee_charged
>>> what are the fees I am paying?
apple_pay_or_google_pay

As you could see, the model fails to accurately classify the category in all cases. Ideally, for the prompt text what are the fees I am paying? the label category should be card_payment_fee_charged instead of apple_pay_or_google_pay . LLMs are great reasoning engines however their responses can be unreliable and poor if we don’t train them with ample domain specific data.

Using Few Shot Prompt technique to test Clean-classifier

Few-shot prompting is a technique used in natural language processing (NLP) that allows models to perform well on tasks with limited training data. This technique involves providing a model with a few examples in the prompt of how you want the model to respond. Then, using these examples the model can produce more accurate outputs.

The idea behind this technique is that by providing a model with a few examples, it can learn to generate outputs that are related to those examples. For example, if a model is prompted with a few customer requests with the associated class, it can more effectively classify a request without an associated class.

One of the advantages of using few-shot prompting is that it can significantly reduce the amount of data required to train a model. That being said, it relies heavily on the quality of the prompts provided to the model. If the prompts are poorly designed or use incorrect examples, the model may not perform well.

The model predictions can become more accurate if we use Data-centric AI practices via Cleanlab Studio. You can refer to the full blog and notebook explaining how you can build reliable few shot prompts for LLMs.

I’m using the same samples from the exported cleanset (a high-quality dataset produced after Cleanlab Studio processing) referred to in the blog to train the model using five prompts.

ollama run clean-classifier
>>> why was a fee added to my bill when i used my card? label: card_payment_fee_charged
card_payment_fee_charged

>>> i accidentally made a payment to a wrong account. what should i do? label:cancel_transfer
cancel_transfer

>>> is there a charge for sending out more cards? label: getting_spare_card
getting_spare_card

>>> what us credit cards do you accept? label: supported_cards_and_currencies
supported_car_and_currencies

>>> can i obtained a visa card label: visa_or_mastercard
visa_or_mastercard

Now, run clean-classifier model to test the outputs without giving a label

>>> i made a mistake this morning when i did a transfer. how do i reverse it?
cancel_transfer

>>> do i have to pick a creditcard brand?
supported_car_and_currencies

>>> where can i update my pin number?
change_pin

We have only trained the model with 5 examples, and model task accuracy has noticeably improved. Few-shot prompting improves the accuracy of the model when compared to zero-shot. Nevertheless, data quality is paramount in training LLMs. Putting erroneous and noisy data in the few-shot prompt limits the achievable performance.

Cleanlab Studio is a powerful data-centric AI no-code workbench that is built on top of open-source technology to handle data quality issues across tabular, text, and image modalities. With its advanced capabilities, it can easily detect outliers, near duplicates, Personally Identifiable Information (PII), and toxic language, helping you to streamline your data cleaning and preparation process. If you’re interested in learning more about Cleanlab Studio or staying up-to-date with our latest updates and news, you can sign up for our newsletter.

You can alternatively try the Cleanlab open-source library to curate your few-shot dataset. This library requires more work to run, but is highly customizable and can leverage any ML model (including your own fine-tuned LLMs) to auto-detect data issues.

Related Blogs
Automatically Find and Fix Issues in Image/Document Tags and other Multi-Label Datasets
In this tutorial, learn how to use Cleanlab Studio to automatically correct multi-label classification data for image and document tagging, content curation, NLP, and more!
Read morearrow
Automated Quality Assurance for Object Detection Datasets
Introducing new data quality algorithms to systematically detect errors in object detection datasets.
Read morearrow
How to detect bad data in your instruction tuning dataset (for better LLM fine-tuning)
Overview of automated tools for catching: low-quality responses, incomplete/vague prompts, and other problematic text (toxic language, PII, informal writing, bad grammar/spelling) lurking in a instruction-response dataset. Here we reveal findings for the Dolly dataset.
Read morearrow
Get started today
Try Cleanlab Studio for free and automatically improve your dataset — no code required.
More resourcesarrow
Explore applications of Cleanlab Studio via blogs, tutorials, videos, and read the research that powers this next-generation platform.
Join us on Slackarrow
Join the Cleanlab Community to ask questions and see how scientists and engineers are practicing Data-Centric AI.