Running LLMs Locally with Ollama
Overview Ollama makes it easy to run large language models locally. No cloud API needed, full privacy, and works on Mac, Linux, and Windows. Installation # macOS / Linux curl -fsSL https://ollama.com/install.sh | sh # Or download from https://ollama.com/download Running Your First Model # Pull and run Llama 3.2 ollama run llama3.2 # Pull and run Mistral ollama run mistral # Pull and run a coding model ollama run codellama Available Models Model Size Use Case llama3.2 3B/8B General purpose mistral 7B Fast, high quality codellama 7B/13B Code generation phi3 3.8B Efficient, Microsoft gemma2 9B Google’s open model qwen2.5 7B Multilingual API Usage import requests response = requests.post('http://localhost:11434/api/generate', json={ 'model': 'mistral', 'prompt': 'Explain machine learning in one paragraph', 'stream': False }) print(response.json()['response']) Using with LangChain from langchain_community.llms import Ollama llm = Ollama(model="mistral") response = llm.invoke("What is the capital of France?") print(response) Custom Models (Modelfile) # Modelfile FROM mistral SYSTEM You are a helpful coding assistant specialized in Python. PARAMETER temperature 0.7 PARAMETER num_ctx 4096 ollama create my-coder -f Modelfile ollama run my-coder Hardware Requirements Model Size RAM Required GPU VRAM 3B 4 GB 4 GB 7B 8 GB 8 GB 13B 16 GB 16 GB 70B 64 GB 48 GB Key Resources Ollama Website Model Library GitHub