Overview
RAG (Retrieval-Augmented Generation) combines document retrieval with LLM generation. Instead of relying solely on the model’s training data, RAG fetches relevant context from your documents.
Architecture
Query → Embed → Vector Search → Retrieve Docs → LLM + Context → Response
Installation
pip install langchain langchain-community chromadb sentence-transformers
Loading Documents
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
# Single PDF
loader = PyPDFLoader("document.pdf")
docs = loader.load()
# Directory of files
loader = DirectoryLoader("./docs", glob="/*.pdf")
docs = loader.load()
Splitting Documents
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", " ", ""]
)
chunks = splitter.split_documents(docs)
Creating Embeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
Vector Store
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
# Search
results = vectorstore.similarity_search("What is machine learning?", k=3)
RAG Chain
from langchain_community.llms import Ollama
from langchain.chains import RetrievalQA
llm = Ollama(model="mistral")
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True
)
response = qa_chain.invoke({"query": "Summarize the main findings"})
print(response["result"])
Production Tips
- Chunk size: 500-1000 tokens works well for most use cases
- Overlap: 10-20% overlap prevents context loss at boundaries
- Reranking: Use a cross-encoder to rerank retrieved documents
- Hybrid search: Combine vector search with keyword search (BM25)
💬 Comments
Comments are coming soon! We're setting up our discussion system.
In the meantime, feel free to contact us with your feedback.