RAG

Retrieval-Augmented Generation (RAG) is a technique where you give the LLM extra context from external data sources (like PDFs, websites, or text files) so it can answer questions better — especially when the info is not present in the model’s training data.

Informally, imagine the LLM is like a student answering questions.

Traditional LLM: Answers from memory

RAG LLM: First opens a textbook, goes to the right chapter, then answers.

We will use OpenAI and langchain to demonstrate RAG.

openai_api_key = '<your_api_key>'
!pip install langchain-openai
Collecting langchain-openai

  Downloading langchain_openai-0.3.30-py3-none-any.whl.metadata (2.4 kB)

Requirement already satisfied: langchain-core<1.0.0,>=0.3.74 in /usr/local/lib/python3.11/dist-packages (from langchain-openai) (0.3.74)

Requirement already satisfied: openai<2.0.0,>=1.99.9 in /usr/local/lib/python3.11/dist-packages (from langchain-openai) (1.99.9)

Requirement already satisfied: tiktoken<1,>=0.7 in /usr/local/lib/python3.11/dist-packages (from langchain-openai) (0.11.0)

Requirement already satisfied: langsmith>=0.3.45 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.74->langchain-openai) (0.4.14)

Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.74->langchain-openai) (9.1.2)

Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.74->langchain-openai) (1.33)

Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.74->langchain-openai) (6.0.2)

Requirement already satisfied: typing-extensions>=4.7 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.74->langchain-openai) (4.14.1)

Requirement already satisfied: packaging>=23.2 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.74->langchain-openai) (25.0)

Requirement already satisfied: pydantic>=2.7.4 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.74->langchain-openai) (2.11.7)

Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.11/dist-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (4.10.0)

Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.11/dist-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (1.9.0)

Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (0.28.1)

Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (0.10.0)

Requirement already satisfied: sniffio in /usr/local/lib/python3.11/dist-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (1.3.1)

Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.11/dist-packages (from openai<2.0.0,>=1.99.9->langchain-openai) (4.67.1)

Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.11/dist-packages (from tiktoken<1,>=0.7->langchain-openai) (2024.11.6)

Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.11/dist-packages (from tiktoken<1,>=0.7->langchain-openai) (2.32.3)

Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.11/dist-packages (from anyio<5,>=3.5.0->openai<2.0.0,>=1.99.9->langchain-openai) (3.10)

Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.99.9->langchain-openai) (2025.8.3)

Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx<1,>=0.23.0->openai<2.0.0,>=1.99.9->langchain-openai) (1.0.9)

Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai<2.0.0,>=1.99.9->langchain-openai) (0.16.0)

Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.11/dist-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.74->langchain-openai) (3.0.0)

Requirement already satisfied: orjson>=3.9.14 in /usr/local/lib/python3.11/dist-packages (from langsmith>=0.3.45->langchain-core<1.0.0,>=0.3.74->langchain-openai) (3.11.2)

Requirement already satisfied: requests-toolbelt>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from langsmith>=0.3.45->langchain-core<1.0.0,>=0.3.74->langchain-openai) (1.0.0)

Requirement already satisfied: zstandard>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from langsmith>=0.3.45->langchain-core<1.0.0,>=0.3.74->langchain-openai) (0.23.0)

Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic>=2.7.4->langchain-core<1.0.0,>=0.3.74->langchain-openai) (0.7.0)

Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.11/dist-packages (from pydantic>=2.7.4->langchain-core<1.0.0,>=0.3.74->langchain-openai) (2.33.2)

Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic>=2.7.4->langchain-core<1.0.0,>=0.3.74->langchain-openai) (0.4.1)

Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests>=2.26.0->tiktoken<1,>=0.7->langchain-openai) (3.4.3)

Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests>=2.26.0->tiktoken<1,>=0.7->langchain-openai) (2.5.0)

Downloading langchain_openai-0.3.30-py3-none-any.whl (74 kB)

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.4/74.4 kB 2.8 MB/s eta 0:00:00

Installing collected packages: langchain-openai

Successfully installed langchain-openai-0.3.30
from langchain_openai import ChatOpenAI
import os

os.environ["OPENAI_API_KEY"] = openai_api_key

llm = ChatOpenAI(model="gpt-3.5-turbo", #or gpt-4
                 temperature=0.7, #optional to pass
                # openai_api_key=openai_api_key #could also be passed here if you do not want to set the environemnt variable
    )

# Now you can use it in a chain, or call it directly as below
response = llm.invoke("When was Acme Inc. founded?")
print(response.content)
There have been several companies with the name Acme Inc. founded throughout history, so it depends on which specific company you are referring to. Can you please provide more context or details?

Sample outputs obtained from the above: - “It is unclear which specific company named Acme Inc. you are referring to, as there are many companies with similar names. Can you please provide more information or context so I can accurately answer your question?”

Other questions that may be asked:

If it is a very specific or new company, the LLM may or may not be able to answer correctly. Hence, we will use RAG.

RAG steps

  1. Read the data for retrieval
  2. Split the text
  3. Produce Embeddings for splits
  4. Store the embeddings in a vectorDB
  5. Create Retriever to retrieve the embeddings from the VectorDB
  6. Combine the LLM and the retriever, and produce results.

#1.Read the data for retrieval

Before doing RAG, we need data. There are different ways to read the data:

1.1. Using simple text
1.2. Using a text file
1.3. Using a pdf file

1.1 Using a simple text

###Document class

LangChain wraps all content in Document objects, which hold both text and optional metadata.

from langchain.schema import Document

doc1 = Document(page_content="This is the first document", metadata={"source": "file1.txt"})
doc1
Document(metadata={'source': 'file1.txt'}, page_content='This is the first document')
doc1.page_content
'This is the first document'
doc1.metadata
{'source': 'file1.txt'}

Each doc is a Document object with attributes: page_content and metadata

1.2 Using a txt file

document_loaders

Loaders help you load documents from .txt, .pdf, .csv, URLs, etc.

!pip install langchain langchain-community
Requirement already satisfied: langchain in /usr/local/lib/python3.11/dist-packages (0.3.27)

Collecting langchain-community

  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)

Requirement already satisfied: langchain-core<1.0.0,>=0.3.72 in /usr/local/lib/python3.11/dist-packages (from langchain) (0.3.74)

Requirement already satisfied: langchain-text-splitters<1.0.0,>=0.3.9 in /usr/local/lib/python3.11/dist-packages (from langchain) (0.3.9)

Requirement already satisfied: langsmith>=0.1.17 in /usr/local/lib/python3.11/dist-packages (from langchain) (0.4.14)

Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in /usr/local/lib/python3.11/dist-packages (from langchain) (2.11.7)

Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.11/dist-packages (from langchain) (2.0.43)

Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.11/dist-packages (from langchain) (2.32.3)

Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.11/dist-packages (from langchain) (6.0.2)

Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.11/dist-packages (from langchain-community) (3.12.15)

Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in /usr/local/lib/python3.11/dist-packages (from langchain-community) (9.1.2)

Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)

  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)

Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)

  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)

Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)

  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)

Requirement already satisfied: numpy>=1.26.2 in /usr/local/lib/python3.11/dist-packages (from langchain-community) (2.0.2)

Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (2.6.1)

Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.4.0)

Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (25.3.0)

Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.11/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.7.0)

Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.11/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (6.6.4)

Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (0.3.2)

Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.20.1)

Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)

  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)

Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)

  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)

Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.72->langchain) (1.33)

Requirement already satisfied: typing-extensions>=4.7 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.72->langchain) (4.14.1)

Requirement already satisfied: packaging>=23.2 in /usr/local/lib/python3.11/dist-packages (from langchain-core<1.0.0,>=0.3.72->langchain) (25.0)

Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from langsmith>=0.1.17->langchain) (0.28.1)

Requirement already satisfied: orjson>=3.9.14 in /usr/local/lib/python3.11/dist-packages (from langsmith>=0.1.17->langchain) (3.11.2)

Requirement already satisfied: requests-toolbelt>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from langsmith>=0.1.17->langchain) (1.0.0)

Requirement already satisfied: zstandard>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from langsmith>=0.1.17->langchain) (0.23.0)

Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.7.0)

Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.11/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (2.33.2)

Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.4.1)

Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)

  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)

Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2->langchain) (3.4.3)

Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2->langchain) (3.10)

Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2->langchain) (2.5.0)

Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2->langchain) (2025.8.3)

Requirement already satisfied: greenlet>=1 in /usr/local/lib/python3.11/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (3.2.4)

Requirement already satisfied: anyio in /usr/local/lib/python3.11/dist-packages (from httpx<1,>=0.23.0->langsmith>=0.1.17->langchain) (4.10.0)

Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx<1,>=0.23.0->langsmith>=0.1.17->langchain) (1.0.9)

Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->langsmith>=0.1.17->langchain) (0.16.0)

Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.11/dist-packages (from jsonpatch<2.0,>=1.33->langchain-core<1.0.0,>=0.3.72->langchain) (3.0.0)

Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community)

  Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB)

Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.11/dist-packages (from anyio->httpx<1,>=0.23.0->langsmith>=0.1.17->langchain) (1.3.1)

Downloading langchain_community-0.3.27-py3-none-any.whl (2.5 MB)

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 33.4 MB/s eta 0:00:00

Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)

Downloading httpx_sse-0.4.1-py3-none-any.whl (8.1 kB)

Downloading pydantic_settings-2.10.1-py3-none-any.whl (45 kB)

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.2/45.2 kB 2.8 MB/s eta 0:00:00

Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.9/50.9 kB 3.4 MB/s eta 0:00:00

Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)

Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)

Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB)

Installing collected packages: python-dotenv, mypy-extensions, marshmallow, httpx-sse, typing-inspect, pydantic-settings, dataclasses-json, langchain-community

Successfully installed dataclasses-json-0.6.7 httpx-sse-0.4.1 langchain-community-0.3.27 marshmallow-3.26.1 mypy-extensions-1.1.0 pydantic-settings-2.10.1 python-dotenv-1.1.1 typing-inspect-0.9.0
from langchain.document_loaders import TextLoader

# If the above doesnt work, use
# from langchain_community.document_loaders import TextLoader

loader = TextLoader("RAG_file.txt")
docs1 = loader.load()
# docs1 is a list of Document objects
for doc in docs1:
    print(doc.page_content)   # The text
    print(doc.metadata)       # File name, etc.
Acme Inc. was founded in 1987 in Helsinki, Finland. It specializes in anti-gravity footwear and rocket-powered pogo sticks.

In 2024, Acme released a new product line: "Jet Sneakers", designed for low-orbit recreational use.

As per internal policy, Acme's HR team meets every 2 weeks to assess wellness metrics of staff based on holographic surveys.

{'source': 'RAG_file.txt'}

Even when loading a single .txt file, TextLoader.load() returns a list of one Document object — for consistency across all loaders in LangChain.

LangChain is designed to treat everything as a list of documents, whether you load:

one .txt file or multiple files at once.

1.3 Using a pdf file

!pip install pypdf
Collecting pypdf

  Downloading pypdf-6.0.0-py3-none-any.whl.metadata (7.1 kB)

Downloading pypdf-6.0.0-py3-none-any.whl (310 kB)

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 310.5/310.5 kB 5.3 MB/s eta 0:00:00

Installing collected packages: pypdf

Successfully installed pypdf-6.0.0
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("RAG_file.pdf")
docs2 = loader.load()
# again docs2 is a list of Document objects
for doc in docs2:
    print(doc.page_content)   # The text
    print(doc.metadata)       # File name, etc.
Acme Inc. was founded in 1987 in Helsinki, Finland. It specializes in anti-gravity footwear and
rocket-powered pogo sticks. In 2024, Acme released a new product line: "Jet Sneakers",
designed for low-orbit recreational use. As per internal policy, Acme's HR team meets every 2
weeks to assess wellness metrics of staff based on holographic surveys.
{'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}

Note: - We can also read csv files using CSVLoader from langchain.document_loaders or langchain_community.document_loaders. - We can also read from html files using UnstructuredHTMLLoader from langchain.document_loaders or langchain_community.document_loaders. - We can also read from online PDF files using OnlinePDFLoader from langchain.document_loaders or langchain_community.document_loaders.

1.4 Mixing Different File Types

You can load different formats separately and then combine them

# Assuming file1, file2, file3, file4 are available
# from langchain.document_loaders import TextLoader, PyPDFLoader, CSVLoader, UnstructuredHTMLLoader

# # Loaders for different file types
# txt_docs = TextLoader("file1.txt").load()
# pdf_docs = PyPDFLoader("file2.pdf").load()
# csv_docs = CSVLoader("file3.csv").load()
# html_docs = UnstructuredHTMLLoader("file4.html").load()

# # Merge all into one list
# all_docs = txt_docs + pdf_docs + csv_docs + html_docs

all_docs is just a list of Document objects → ready for splitting, embedding, and vector storage.

2.Split the text

LLMs have token limits, so long files need to be split.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10) #or 300 and 50 resp
chunks = splitter.split_documents(docs2)
chunks
[Document(metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='Acme Inc. was founded in 1987 in Helsinki, Finland. It specializes in anti-gravity footwear and'),
 Document(metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='rocket-powered pogo sticks. In 2024, Acme released a new product line: "Jet Sneakers",'),
 Document(metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content="designed for low-orbit recreational use. As per internal policy, Acme's HR team meets every 2"),
 Document(metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='weeks to assess wellness metrics of staff based on holographic surveys.')]

Like split_documents(), ther is also a function split_text() which can directly split the text. Let me use that below and show the effect of the parameters chunk_size and chunk_overlap

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks1 = splitter.split_text("Very long document text here...")
chunks1
['Very long document text here...']
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=3, chunk_overlap=1)
chunks2 = splitter.split_text("Very long document text here...")
chunks2
['Ver',
 'ry',
 'lo',
 'ong',
 'do',
 'ocu',
 'ume',
 'ent',
 'te',
 'ext',
 'he',
 'ere',
 'e..',
 '..']

3.Produce Embeddings for splits

For each chunk, we generate embeddings, which are numerical vectors that capture semantic meaning, such that,

  • For similar texts, embeddings are closeby.

  • And for dissimilar texts, embeddings are far apart.

from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings()

# The following extracts the text from each Document chunk, and then converts each chunk into a vector (list of floats).

vectors = embedding_model.embed_documents([doc.page_content for doc in chunks])
len(vectors)
4
len(vectors[0])
1536
# vectors[0]

4.Store the embeddings in a vectorDB

The embeddings (and chunks) are stored in a vector database (FAISS, Pinecone, Weaviate, Chroma, etc.).

This lets us search semantically, not just by keywords.

(Later, when a user asks a query, the query itself is embedded. DB finds the nearest chunk embeddings and retrieves relevant chunks.)

In this demo, we will be using FAISS VectorDB.

FAISS stands for Facebook AI Similarity Search

Open-source library from Meta (Facebook AI).

It is optimized for fast similarity search in high-dimensional vectors (like embeddings).

It is used for:

  • Nearest neighbor search
  • Clustering
  • Efficient retrieval in RAG pipelines
!pip install faiss-cpu
Collecting faiss-cpu

  Downloading faiss_cpu-1.12.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)

Requirement already satisfied: numpy<3.0,>=1.25.0 in /usr/local/lib/python3.11/dist-packages (from faiss-cpu) (2.0.2)

Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from faiss-cpu) (25.0)

Downloading faiss_cpu-1.12.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.4 MB)

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 31.4/31.4 MB 48.3 MB/s eta 0:00:00

Installing collected packages: faiss-cpu

Successfully installed faiss-cpu-1.12.0
from langchain.vectorstores import FAISS

vectorstore = FAISS.from_documents(chunks, embedding_model)

# This stores both embeddings and data together.
vectorstore
<langchain_community.vectorstores.faiss.FAISS at 0x7d69e899fd10>

Access stored documents (Optional)

# Get all documents back (but embeddings are inside the FAISS index)
all_docs = vectorstore.docstore._dict

for doc_id, doc in all_docs.items():
    print("ID:", doc_id)
    print("Content:", doc.page_content)
    print("Metadata:", doc.metadata)
    print("-" * 40)
ID: 3f6cd2c9-42c5-4e95-a56e-5e5c5fa1da1e
Content: Acme Inc. was founded in 1987 in Helsinki, Finland. It specializes in anti-gravity footwear and
Metadata: {'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}
----------------------------------------
ID: 3e61c826-0d6c-4c82-a4f3-e42efde3bf43
Content: rocket-powered pogo sticks. In 2024, Acme released a new product line: "Jet Sneakers",
Metadata: {'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}
----------------------------------------
ID: 499a4a3b-9497-4a44-acf0-78b2a09eea04
Content: designed for low-orbit recreational use. As per internal policy, Acme's HR team meets every 2
Metadata: {'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}
----------------------------------------
ID: 0770a2d5-f717-424a-a917-b5c63bcd0dfc
Content: weeks to assess wellness metrics of staff based on holographic surveys.
Metadata: {'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}
----------------------------------------

Save & reload FAISS (Optional)

# Save locally
vectorstore.save_local("faiss_index")

# Reload later
new_store = FAISS.load_local("faiss_index", OpenAIEmbeddings(), allow_dangerous_deserialization=True )

After creating a vectorstore, we can do similarity search using similarity_search() which uses the following process:

  1. The query is converted into an embedding vector using the same embedding model you used for your documents.

  2. FAISS computes similarity between the query embedding and all stored embeddings.

  3. It returns the top-k most similar chunks (Document objects).

Output = a list of Documents (List[Document]).

# extra
retrieved_docs = vectorstore.similarity_search('What was the launch date?', k = 2) # k is the number of documents to retrieve
retrieved_docs
[Document(id='3e61c826-0d6c-4c82-a4f3-e42efde3bf43', metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='rocket-powered pogo sticks. In 2024, Acme released a new product line: "Jet Sneakers",'),
 Document(id='0770a2d5-f717-424a-a917-b5c63bcd0dfc', metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='weeks to assess wellness metrics of staff based on holographic surveys.')]

How similarity is measured?

FAISS uses vector distance metrics (like cosine similarity, L2 distance).

Embeddings of the query and chunks are compared.

Smaller distance = higher similarity.

Another variant:

similarity_search_with_score: Also gives similarity score along with the document

results = vectorstore.similarity_search_with_score("What was the launch date?", k=2)
for doc, score in results:
    print(doc.page_content, score)
rocket-powered pogo sticks. In 2024, Acme released a new product line: "Jet Sneakers", 0.44364873
weeks to assess wellness metrics of staff based on holographic surveys. 0.52889407

The previous four steps were to prepare the data for RAG. Now we can do the retrieval.

#5.Create Retriever to retrieve the embeddings from the VectorDB

A retriever is a wrapper around the vector store that defines how to fetch documents given a query.

retriever = vectorstore.as_retriever()
# or
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

(Optional) Retriever is a standardized interface that implements get_relevant_documents(query)

docs = retriever.get_relevant_documents("What was the launch date?")
for d in docs:
    print(d.page_content, d.metadata)
    print()
rocket-powered pogo sticks. In 2024, Acme released a new product line: "Jet Sneakers", {'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}

weeks to assess wellness metrics of staff based on holographic surveys. {'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}

designed for low-orbit recreational use. As per internal policy, Acme's HR team meets every 2 {'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}

#6.Combine the LLM and the retriever, and produce results.

Finally, LLM and the retriever are combined into a RetrievalQA chain.

RetrievalQA

It is a LangChain chain designed specifically for retrieval-augmented generation (RAG).

It combines the following:

  1. Retriever: pulls back the most relevant documents from your vector database.
  2. LLM: takes the retrieved documents + user’s query, and generates an answer.

So instead of the LLM hallucinating, it grounds its answers on your documents.

from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, return_source_documents=True)

.invoke({"query": query})

Runs the chain.

  • Input: a dictionary where the key is “query”.

  • Output: dictionary with keys like:

    • “result” : final LLM-generated answer.
    • “source_documents” (if enabled): list of docs retrieved.
query = "When and where was Acme Inc. founded?"
response = qa_chain.invoke({"query": query})
print(response["result"])
Acme Inc. was founded in 1987 in Helsinki, Finland.
print(response["source_documents"])
[Document(id='3f6cd2c9-42c5-4e95-a56e-5e5c5fa1da1e', metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='Acme Inc. was founded in 1987 in Helsinki, Finland. It specializes in anti-gravity footwear and'), Document(id='3e61c826-0d6c-4c82-a4f3-e42efde3bf43', metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='rocket-powered pogo sticks. In 2024, Acme released a new product line: "Jet Sneakers",'), Document(id='499a4a3b-9497-4a44-acf0-78b2a09eea04', metadata={'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250730135006', 'source': 'RAG_file.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content="designed for low-orbit recreational use. As per internal policy, Acme's HR team meets every 2")]