Ollama is an open-source framework designed to facilitate the local execution of Large Language Models (LLMs) such as LLaMa and others. It allows you to run these models directly on your machines, providing a secure and customizable environment without relying on cloud services.
In this article, we’ll cover how to use Ollama. In the next, we will discuss the intricacies of Ollama to the depth that no other blog has gone before.
Building your own Chatbot with Ollama
Step 1 - Get the Server running
You will first need to install Ollama server. The process varies a bit for the Operating System you are using so I’ll list all 3 here.
MacOS
There are essentially 2 different ways for you to get started with Ollama on Mac. We’ll of course cover the easiest one. Installation with brew
. Running the following command will install the Ollama server on your system.
brew install ollama
I told you it would be really easy to do so.
Linux
To install Ollama on linux, just copy the following command.
curl -fsSL https://ollama.com/install.sh | sh
Windows
On windows, Ollama is still in preview but you can download .exe of Ollama from https://ollama.com/download/windows and that will install Ollama on your system.
Step 2 - Testing if it works
On Windows, you have to just open the .exe you downloaded to start the server. For the rest, we can start the server with the command
ollama serve
This will start the server on the machine itself.
Note that at this stage, the server doesn’t have access to any LLM model. To download the model onto the server, you’ll need to use a different command.
Step 3 - Downloading the Model
Downloading the model is pretty straight forward. Knowing which one to download, is not. There are 116 different LLMs that Ollama supports at the time of writing this and this is excluding different variants of each of the model.
You can explore the entire repository of models that Ollama supports at https://ollama.com/library
For the purpose of this blog, I’ll stick to Qwen2 model. Not because it is better or anything, it has one variant which is just 350MBs and one of the few models that I can run on my laptop itself without renting out a VM.
To download, open a separate terminal while keeping the server running (pro-tip, you can also keep the server running in the background by adding &
to the end of the command and keep using the same terminal for next steps) and execute the following step.
ollama pull qwen2:0.5b-instruct
It will download the Qwen2 model with a variant that has 0.5 billion parameters. You can in same way download llama3.1, phi3, or any other model that you find that suits your needs.
Step 4 - Interact with the model
We can interact with the model directly from the command line itself. To prompt a model with the prompt “Who is the Prime Minister of India?”, you will need to run,
ollama run qwen2:0.5b-instruct "Who is the Prime Minister of India?"
This command will first load the model onto the GPU or the CPU, whichever is available and prompt the model with your query. Internally, it is making an API call to “/api/generate” endpoint which you will be able to see on the logs it will print on the terminal alongside the response.
Step 5 - Building your Chat app with this
To do so, we’ll use Ollama’s Python client (there is also a JavaScript one) and use Streamlit to build the interface for the chat application.
Installing the dependencies.
pip install ollama streamlit
Next, just copy paste this code.
import streamlit as st
from ollama import Client
client = Client(host="http://localhost:11434")
st.title("Hey, it's just like ChatGPT, but free!")
# Initialize chat history
if "messages" not in st.session_state:
st.session_state.messages = []
# Display chat messages from history on app rerun
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Accept user input
if prompt := st.chat_input("What is up?"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message in chat message container
with st.chat_message("user"):
st.markdown(prompt)
# Generate and display assistant response (example)
response = client.chat(model="qwen2:0.5b-instruct", messages=st.session_state.messages)["message"]["content"]
with st.chat_message("assistant"):
st.markdown(response)
# Add assistant response to chat history
st.session_state.messages.append({"role": "assistant", "content": response})
I too generated this code with Perplexity and just added the model name 😂 If you are new to Streamlit, there is this wonderful playlist you can check.
Now, just execute the script with
streamlit run main.py
And you’ll instantly have a chat interface like below
And there we have it. Our own AI chatbot!
In the next article next week, we will be discussing Ollama in much more detail. I’ll share everything I learnt during my last two weeks reading Ollama codebase (which, trust me I have spent hours scratching the internet for, you’ll not find anywhere else). So for that, please follow and subscribe to the newsletter.
Until then, have a healthy and happy week.