Overview
Ollama makes it easy to run large language models locally. No cloud API needed, full privacy, and works on Mac, Linux, and Windows.
Installation
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or download from https://ollama.com/download
Running Your First Model
# Pull and run Llama 3.2
ollama run llama3.2
# Pull and run Mistral
ollama run mistral
# Pull and run a coding model
ollama run codellama
Available Models
| Model | Size | Use Case |
|---|
| llama3.2 | 3B/8B | General purpose |
| mistral | 7B | Fast, high quality |
| codellama | 7B/13B | Code generation |
| phi3 | 3.8B | Efficient, Microsoft |
| gemma2 | 9B | Google’s open model |
| qwen2.5 | 7B | Multilingual |
API Usage
import requests
response = requests.post('http://localhost:11434/api/generate', json={
'model': 'mistral',
'prompt': 'Explain machine learning in one paragraph',
'stream': False
})
print(response.json()['response'])
Using with LangChain
from langchain_community.llms import Ollama
llm = Ollama(model="mistral")
response = llm.invoke("What is the capital of France?")
print(response)
Custom Models (Modelfile)
# Modelfile
FROM mistral
SYSTEM You are a helpful coding assistant specialized in Python.
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
ollama create my-coder -f Modelfile
ollama run my-coder
Hardware Requirements
| Model Size | RAM Required | GPU VRAM |
|---|
| 3B | 4 GB | 4 GB |
| 7B | 8 GB | 8 GB |
| 13B | 16 GB | 16 GB |
| 70B | 64 GB | 48 GB |
Key Resources
💬 Comments
Comments are coming soon! We're setting up our discussion system.
In the meantime, feel free to contact us with your feedback.