How to Save and Retrieve a Vector Database using LangChain, FAISS, and Gemini Embeddings

Efficient storage and retrieval of vector databases is foundational for building intelligent retrieval-augmented generation (RAG) systems using large language models (LLMs). In this guide, we’ll walk through a professional-grade Python implementation that utilizes LangChain with FAISS and Google Gemini Embeddings to store document embeddings and retrieve similar information. This setup is highly suitable for advanced machine learning (ML) and deep learning (DL) engineers who work with semantic search and retrieval pipelines.

Why Vector Databases Matter in LLM Applications

Traditional keyword-based search systems fall short when it comes to understanding semantic meaning. Vector databases store high-dimensional embeddings of text data, allowing for approximate nearest-neighbor (ANN) searches based on semantic similarity. These capabilities are critical in applications like:

Question Answering Systems
Enterprise Knowledge Retrieval
Legal or Medical Document Search
LLM-powered Chat Assistants with Context Memory

Benefits of Using LangChain in this Workflow

LangChain offers abstraction layers and integrations that simplify the orchestration of complex pipelines involving document loading, chunking, embedding, storing, and retrieval. Specifically, in this setup:

It abstracts various document loaders (PDF, Excel, text).
Offers robust text splitting strategies via RecursiveCharacterTextSplitter.
Seamlessly connects to FAISS for fast similarity search.
Integrates with Google's Gemini embedding models for powerful semantic understanding.
Supports retrieval interfaces that can be used directly with LLM chains for contextual question answering.

Implementation Strategy

The provided Python implementation performs the following steps:

Environment Setup: Loads the Gemini embedding model using credentials from environment variables.
Database Initialization: Loads an existing FAISS vector database if it exists or creates a new one.
Document Loading: Supports different formats (PDF, Excel, text) via LangChain's loaders.
Text Splitting: Splits large documents into manageable chunks with overlap for better context preservation.
Batch Embedding: Embeds and adds chunks to the FAISS database in batches to manage memory efficiently.
Persistent Storage: Saves the updated FAISS vector store locally to disk.
Semantic Retrieval: Reloads the database and performs a similarity search against the embedded vectors.

Scalability Considerations

This implementation includes batch-wise document addition (IDX_DELTA) and buffer sizing to manage large corpora efficiently. By dividing document embeddings into manageable chunks, it avoids memory overflows and accelerates ingestion for large datasets—making it production-ready for enterprise settings.

Use Case: Retrieval-Augmented Generation (RAG)

With the saved vector database, you can now enhance LLM responses by injecting semantically retrieved chunks into prompts. This is a core pattern in modern RAG systems, allowing you to ground model outputs with trusted context.

Full Python Code

You can download the 'reciprocam.pdf' file from 'Recipro-CAM: Fast gradient-free visual explanations for convolutional neural networks'

import os
from langchain_community.document_loaders import TextLoader, UnstructuredExcelLoader, PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.schema import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

from dotenv import load_dotenv
load_dotenv(".env")

# Setup embedding model as GoogleGenerativeAIEmbeddings
# 'google_api_key' parameter will be assigned by 'GOOGLE_API_KEY' environment variable
embeddings = GoogleGenerativeAIEmbeddings(model = "models/text-embedding-004")

db_path = "./faiss-doc-db"

# Create new vector DB if there is the database but if there is previous db then add new information
def create_vector_database(db_path, txt_path, type="text"):
    if os.path.exists(db_path):
        db = FAISS.load_local(db_path, embeddings=embeddings, allow_dangerous_deserialization=True)
    else:
        documents = [Document(page_content='RAG Document')]
        db = FAISS.from_documents(documents, embeddings)

    separators = ['\n\n', '\n', ' ', '\t']
    chunk_size = 1000
    chunk_overlap = 100
    if type == "excel":
        loader = UnstructuredExcelLoader(txt_path)
    elif type == "pdf":
        loader = PyPDFLoader(txt_path)
    else:
        loader = TextLoader(txt_path)
    docs = loader.load()   
    
    documents = RecursiveCharacterTextSplitter(
        separators=separators,
        chunk_size=chunk_size, 
        is_separator_regex=False, 
        chunk_overlap=chunk_overlap
    ).split_documents(docs)

    MAX_BUFFER_SIZE = 100000
    IDX_DELTA = MAX_BUFFER_SIZE//chunk_size        
    doc_size = len(documents)
    remainder = doc_size % IDX_DELTA
    last_idx = doc_size - remainder
    print(f"Total documents: {doc_size}")
    print(f"Last index: {last_idx}")
    print(f"Remainder: {remainder}")
    for idx in range(0, last_idx, IDX_DELTA):
        db.add_documents(documents=documents[idx:idx+IDX_DELTA])
    if last_idx < doc_size:
        db.add_documents(documents=documents[last_idx:])

    db.save_local(db_path)

# Save custom PDF document as vector database
#create_vector_database(db_path, "./reciprocam.pdf", type="pdf")

# Retrieve related document for a given query from vector DB
def retrieve(query: str):
    vectorstore_faiss = FAISS.load_local(db_path, embeddings, allow_dangerous_deserialization=True)
    faiss_retriever = vectorstore_faiss.as_retriever(search_type="similarity", search_kwargs={"k": 2})
    """Retrieve information related to a query."""
    print(f"Query: {query}")
    retrieved_docs = faiss_retriever.invoke(query)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\n" f"Content: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

serial_doc, ret_doc = retrieve("Let me know what is a CAM.")
print(f"Result: {serial_doc}.")

Result:

Query: Let me know what is a CAM.
Result: Source: {'source': './reciprocam.pdf', 'page': 1}
Content: The ﬁrst solution suggested to address this issue is CAM Zhou et al. [2016]. This method produces a map that highlights
the important regions of an image for a particular class by multiplying a global average pooling activation vector with a
fully connected weight vector speciﬁc to the class. Essentially, the saliency map Sc for a given class cis obtained by
Sc =
∑
k
wk,c
∑
u,v
fk(u,v) (1)
where wk,c is the last FC layer’s weight between channel k and class c and fk(u,v) is the activation at (u,v) of
channel k. CAM allows AI practitioners not only to analyze the capacity of their neural network architecture but also
to understand how the network reacts to speciﬁc classes of input data. However, this method has a limitation in that
it requires the presence of a global average or max pooling layer in the architecture. This means that certain neural
network architectures may not be compatible with CAM method.

Source: {'source': './reciprocam.pdf', 'page': 5}
Content: arXiv A PREPRINT
Table 1: Comparison of different CAM-based approaches using existing metrics on six different backbones. The
evaluation scores for other CAM methods were obtained from Poppi et al. [2021].
VGG-16 ResNet-18
Method Drop
(↓)
...
Score-CAM 26.13 24.75 9.52 47.00 93.83 20.27 81.66 12.81 40.41 10.76 46.01 98.35 41.78 77.30
Recipro-CAM 21.51 34.86 9.50 46.88 92.24 27.48 80.27 20.68 36.30 10.19 44.93 97.38 33.60 79.08
ResNet-50 ResNet-101
Grad-CAM 32.99 24.27 17.49 48.48 82.80 22.24 75.27 29.38 29.35 18.66 47.47 81.97 22.51 76.40.

References

Building an MCP Agent with UV, Python & mcp-use

Model Context Protocol (MCP) is an open protocol designed to enable AI agents to interact with external tools and data in a standardized way. MCP is composed of three components: server , client , and host . MCP host The MCP host acts as the interface between the user and the agent (such as Claude Desktop or IDE) and plays the role of connecting to external tools or data through MCP clients and servers. Previously, Anthropic’s Claude Desktop was introduced as a host, but it required a separate desktop app, license, and API key management, leading to dependency on the Claude ecosystem. mcp-use is an open-source Python/Node package that connects LangChain LLMs (e.g., GPT-4, Claude, Groq) to MCP servers in just six lines of code, eliminating dependencies and supporting multi-server and multi-model setups. MCP Client The MCP client manages the MCP protocol within the host and is responsible for connecting to MCP servers that provide the necessary functions for the ...

AI Practitioner

Search This Blog