Overview

RAG (Retrieval-Augmented Generation) combines document retrieval with LLM generation. Instead of relying solely on the model’s training data, RAG fetches relevant context from your documents.

Architecture

Query → Embed → Vector Search → Retrieve Docs → LLM + Context → Response

Installation

pip install langchain langchain-community chromadb sentence-transformers

Loading Documents

from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader

# Single PDF
loader = PyPDFLoader("document.pdf")
docs = loader.load()

# Directory of files
loader = DirectoryLoader("./docs", glob="/*.pdf")
docs = loader.load()

Splitting Documents

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)
chunks = splitter.split_documents(docs)

Creating Embeddings

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

Vector Store

from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# Search
results = vectorstore.similarity_search("What is machine learning?", k=3)

RAG Chain

from langchain_community.llms import Ollama
from langchain.chains import RetrievalQA

llm = Ollama(model="mistral")
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

response = qa_chain.invoke({"query": "Summarize the main findings"})
print(response["result"])

Production Tips

  1. Chunk size: 500-1000 tokens works well for most use cases
  2. Overlap: 10-20% overlap prevents context loss at boundaries
  3. Reranking: Use a cross-encoder to rerank retrieved documents
  4. Hybrid search: Combine vector search with keyword search (BM25)

Key Resources