Skip to content

Running LLMs Locally with Ollama

Overview Ollama makes it easy to run large language models locally. No cloud API needed, full privacy, and works on Mac, Linux, and Windows. Installation # macOS / Linux curl -fsSL https://ollama.com/install.sh | sh # Or download from https://ollama.com/download Running Your First Model # Pull and run Llama 3.2 ollama run llama3.2 # Pull and run Mistral ollama run mistral # Pull and run a coding model ollama run codellama Available Models Model Size Use Case llama3.2 3B/8B General purpose mistral 7B Fast, high quality codellama 7B/13B Code generation phi3 3.8B Efficient, Microsoft gemma2 9B Google’s open model qwen2.5 7B Multilingual API Usage import requests response = requests.post('http://localhost:11434/api/generate', json={ 'model': 'mistral', 'prompt': 'Explain machine learning in one paragraph', 'stream': False }) print(response.json()['response']) Using with LangChain from langchain_community.llms import Ollama llm = Ollama(model="mistral") response = llm.invoke("What is the capital of France?") print(response) Custom Models (Modelfile) # Modelfile FROM mistral SYSTEM You are a helpful coding assistant specialized in Python. PARAMETER temperature 0.7 PARAMETER num_ctx 4096 ollama create my-coder -f Modelfile ollama run my-coder Hardware Requirements Model Size RAM Required GPU VRAM 3B 4 GB 4 GB 7B 8 GB 8 GB 13B 16 GB 16 GB 70B 64 GB 48 GB Key Resources Ollama Website Model Library GitHub

December 1, 2025 · 1 min · 207 words · BlogIA Team

Review: Ollama - Run any model locally

Review of Ollama: Run any model locally. Score: 7.2/10

January 19, 2026 · 3 min · 539 words · BlogIA Reviews