Overview

Ollama makes it easy to run large language models locally. No cloud API needed, full privacy, and works on Mac, Linux, and Windows.

Installation

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or download from https://ollama.com/download

Running Your First Model

# Pull and run Llama 3.2
ollama run llama3.2

# Pull and run Mistral
ollama run mistral

# Pull and run a coding model
ollama run codellama

Available Models

ModelSizeUse Case
llama3.23B/8BGeneral purpose
mistral7BFast, high quality
codellama7B/13BCode generation
phi33.8BEfficient, Microsoft
gemma29BGoogle’s open model
qwen2.57BMultilingual

API Usage

import requests

response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'mistral',
    'prompt': 'Explain machine learning in one paragraph',
    'stream': False
})
print(response.json()['response'])

Using with LangChain

from langchain_community.llms import Ollama

llm = Ollama(model="mistral")
response = llm.invoke("What is the capital of France?")
print(response)

Custom Models (Modelfile)

# Modelfile
FROM mistral
SYSTEM You are a helpful coding assistant specialized in Python.
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
ollama create my-coder -f Modelfile
ollama run my-coder

Hardware Requirements

Model SizeRAM RequiredGPU VRAM
3B4 GB4 GB
7B8 GB8 GB
13B16 GB16 GB
70B64 GB48 GB

Key Resources