Does this RAG project require OpenAI or cloud APIs?

No. It uses FAISS for vector search and GPT-2 for local text generation. The entire pipeline runs offline after initial model download.

How is this different from DocuMind?

This is a lightweight learning/demo stack with FAISS and Streamlit. DocuMind is a production reference with ChromaDB, Ollama embeddings, FastAPI, citation grounding, and dual library collections — documented at draketalley.ai/blog/documind-local-first-rag-platform.

What documents can I index?

Plain text and common document formats supported by LangChain loaders. Chunk size and overlap are configurable in the notebook/app configuration.

Can I swap GPT-2 for a larger model?

Yes — replace the LangChain LLM wrapper with any Hugging Face or local model compatible with your hardware. DocuMind demonstrates the Ollama-based production pattern.

Local RAG with Streamlit, LangChain, and FAISS — No API Keys Required

Project Summary

Fully local RAG with FAISS indexing, LangChain chunking, GPT-2 generation, and Streamlit UI — no cloud API keys. Conceptual ancestor of the DocuMind production stack.

This repository implements a fully local Retrieval-Augmented Generation (RAG) pipeline using LangChain, FAISS vector search, and a GPT-2 language model — with no external API keys required. It predates my production DocuMind stack but captures the same core insight: grounding answers in retrieved context beats unconstrained generation for factual tasks. The Streamlit UI makes the retrieval-and-generation loop visible, which is ideal for demos, teaching, and SEO around terms like local RAG pipeline, FAISS LangChain tutorial, and offline LLM question answering.

Architecture overview

Local RAG loop: retrieve relevant chunks, then condition generation on retrieved text only.

Key design decisions

FAISS in-memory vector index for fast cosine similarity without a vector database server
LangChain document loaders and text splitters for chunking with configurable overlap
GPT-2 as a lightweight local LLM — no OpenAI or cloud inference dependency
Streamlit frontend exposing query input, retrieved sources, and generated response
End-to-end runnable on a laptop with modest GPU or CPU inference

When to use this vs DocuMind

Criteria	RAG Streamlit (this repo)	DocuMind (production)
Vector store	FAISS in-memory	ChromaDB with persistent collections
LLM backend	GPT-2 local	Ollama (llama3, embeddings)
API	Streamlit only	FastAPI + Next.js UI
Citations	Basic source display	Structured SourceCitation objects
Best for	Learning, quick demos	Production RAG reference

Setup

git clone https://github.com/cdtalley/rag-streamlit-langchain
cd rag-streamlit-langchain
pip install -r requirements.txt
streamlit run app.py

Key Features & Capabilities

FAISS in-memory vector index for cosine similarity search
LangChain document loaders and configurable text splitters
GPT-2 local generation conditioned on retrieved context
Streamlit UI exposing query, sources, and generated answers

Tech Stack & Components

PythonLangChainFAISSGPT-2StreamlitHugging Face

Getting Started

1.Run locally

Install dependencies and launch Streamlit.

git clone https://github.com/cdtalley/rag-streamlit-langchain
pip install -r requirements.txt
streamlit run app.py

Frequently asked questions

Does this RAG project require OpenAI or cloud APIs?: No. It uses FAISS for vector search and GPT-2 for local text generation. The entire pipeline runs offline after initial model download.
How is this different from DocuMind?: This is a lightweight learning/demo stack with FAISS and Streamlit. DocuMind is a production reference with ChromaDB, Ollama embeddings, FastAPI, citation grounding, and dual library collections — documented at draketalley.ai/blog/documind-local-first-rag-platform.
What documents can I index?: Plain text and common document formats supported by LangChain loaders. Chunk size and overlap are configurable in the notebook/app configuration.
Can I swap GPT-2 for a larger model?: Yes — replace the LangChain LLM wrapper with any Hugging Face or local model compatible with your hardware. DocuMind demonstrates the Ollama-based production pattern.