Skip to main content

How to Build a Simple LLM Chatbot Server with Google Gemini 2.5 Pro and LangChain

Introduction

This post walks through how to implement a lightweight yet powerful chatbot backend using Google Gemini 2.5 Pro and LangChain. It also covers how to deploy a chat-friendly frontend interface and understand the internal architecture powering this conversational AI.

Whether you're prototyping or integrating LLMs into enterprise-scale apps, this pattern gives you a solid foundation to build on.

Step 1: Install Dependencies

Here's the minimal tech stack we’ll use:

Python Packages

pip install flask flask-cors langchain langchain-google-genai python-dotenv

Make sure you have a .env file with your Google API key:

GOOGLE_API_KEY=your_google_api_key_here

Step 2: Chatbot Architecture

Here’s a high-level diagram of how the system works:

User (Web UI)
     │
     ▼
HTTP POST /chat
     │
     ▼
Flask API
     │
     ▼
LangChain Prompt Template → Gemini 2.5 Pro (via Google Generative AI)
     │
     ▼
Response → JSON → UI
  • Frontend sends a POST request with the user input.
  • Flask handles the API and invokes a LangChain chain.
  • The chain applies a prompt template and passes it to Gemini 2.5 Pro.
  • The result is returned as a JSON response to the frontend.

Step 3: Backend Code (Flask + LangChain + Gemini)

import os
from dotenv import load_dotenv  # To load environment variables from a .env file
from langchain_google_genai import ChatGoogleGenerativeAI  # Gemini 2.5 LLM wrapper from LangChain
from langchain_core.prompts import ChatPromptTemplate  # For structured prompt formatting
from flask import Flask, request, jsonify  # Flask framework for web APIs
from flask_cors import CORS  # To enable Cross-Origin requests (important for frontend interaction)

# Initialize Flask app
app = Flask(__name__)
CORS(app)  # Enable CORS so frontend (on different port) can call backend

# Load API key from .env file
load_dotenv(".env")
api_key = os.getenv("GOOGLE_API_KEY")  # Safely access your Google Gemini API key

# Initialize Gemini 2.5 Pro model using LangChain's wrapper
llm_pro = ChatGoogleGenerativeAI(
    model='gemini-2.5-pro-exp-03-25',  # Specific experimental model version
    temperature=0.5  # Controls randomness of output. Lower = more deterministic
)

# Define a LangChain-style prompt template with system and user roles
prompt = ChatPromptTemplate([
    ('system', 'You are a helpful chatbot.'),  # Initial role instruction
    ('user', 'Generate answer for the following question: {message}.')  # Input variable formatted as {message}
])

# Create a LangChain Runnable: Prompt Template → Gemini LLM
chain = prompt | llm_pro

# Define the Flask endpoint that handles incoming chat requests
@app.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()  # Parse incoming JSON from frontend
    print(f"Received data: {data}")  # Debug logging

    user_query = data.get('user_query')  # Extract user input

    if not user_query:  # Basic validation
        return jsonify({'response': "Please enter your question."})

    # Invoke the LangChain chain with the input
    response = chain.invoke(input=user_query).content  # Get LLM response text

    return jsonify({'response': response})  # Send it back as JSON

# Run Flask server
if __name__ == '__main__':
    app.run(debug=True, port=8000)

Key Takeaways

  • The prompt is constructed declaratively, and LangChain pipes it directly into the Gemini LLM.
  • You’re using a modern LangChain Runnable style (prompt | llm) to define the processing flow.
  • Fast setup with Flask and CORS allows quick integration with any frontend.

Step 4: Frontend Code

Here’s a minimal HTML/JavaScript client for interacting with your chatbot:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>Gemini Chatbot</title>
  <style>
    body {
      font-family: 'Segoe UI', sans-serif;
      background: #f5f5f5;
      margin: 0;
      padding: 0;
      display: flex;
      flex-direction: column;
      height: 100vh;
    }

    #chat-container {
      display: flex;
      flex-direction: column;
      justify-content: space-between;
      height: 100vh;
      max-width: 700px;
      margin: 0 auto;
      padding: 1rem;
    }

    #messages {
      flex: 1;
      overflow-y: auto;
      background: white;
      padding: 1rem;
      border-radius: 10px;
      box-shadow: 0 0 8px rgba(0, 0, 0, 0.1);
    }

    .msg {
      margin: 0.5rem 0;
      padding: 0.75rem 1rem;
      border-radius: 18px;
      max-width: 80%;
      word-wrap: break-word;
      line-height: 1.4;
    }

    .user-msg {
      background-color: #d1e7dd;
      align-self: flex-end;
      text-align: right;
    }

    .bot-msg {
      background-color: #e2e3e5;
      align-self: flex-start;
      text-align: left;
    }

    #input-area {
      display: flex;
      gap: 0.5rem;
      padding-top: 1rem;
    }

    input[type="text"] {
      flex: 1;
      padding: 0.75rem;
      border-radius: 18px;
      border: 1px solid #ccc;
      font-size: 1rem;
    }

    button {
      padding: 0.75rem 1rem;
      border-radius: 18px;
      border: none;
      background-color: #007bff;
      color: white;
      font-size: 1rem;
      cursor: pointer;
    }

    button:hover {
      background-color: #0056b3;
    }

    .loading {
      font-size: 0.9rem;
      color: #999;
      margin-top: 0.5rem;
    }
  </style>
</head>
<body>
  <div id="chat-container">
    <div id="messages"></div>

    <div id="input-area">
      <input type="text" id="user_input" placeholder="Type your message..." onkeydown="handleEnter(event)" />
      <button onclick="sendMessage()">Send</button>
    </div>
    <div id="status" class="loading"></div>
  </div>

  <script>
    const messagesEl = document.getElementById("messages");
    const inputEl = document.getElementById("user_input");
    const statusEl = document.getElementById("status");

    function addMessage(content, isUser = false) {
      const msgDiv = document.createElement("div");
      msgDiv.classList.add("msg", isUser ? "user-msg" : "bot-msg");
      msgDiv.textContent = content;
      messagesEl.appendChild(msgDiv);
      messagesEl.scrollTop = messagesEl.scrollHeight;
    }

    async function sendMessage() {
      const input = inputEl.value.trim();
      if (!input) return;

      addMessage(input, true);
      inputEl.value = "";
      inputEl.disabled = true;
      statusEl.textContent = "Thinking...";

      try {
        const response = await fetch("http://localhost:8000/chat", {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({ user_query: input })
        });

        const data = await response.json();
        addMessage(data.response, false);
      } catch (error) {
        addMessage("Error: Could not reach chatbot server.", false);
        console.error(error);
      } finally {
        inputEl.disabled = false;
        inputEl.focus();
        statusEl.textContent = "";
      }
    }

    function handleEnter(event) {
      if (event.key === "Enter") {
        sendMessage();
      }
    }
  </script>
</body>
</html>

Key UX Features

Chat BubblesMessages are styled differently for user and bot
Auto ScrollScrolls to latest message automatically
Responsive InputEnter key support for quick messaging
Loading StateShows “Thinking…” while Gemini is responding
StylingClean, mobile-responsive layout




python app.py

Open your index.html in a browser, and start chatting with Gemini!

Make sure CORS is enabled properly and ports are open if testing from a different domain.

References

Comments

Popular

How to Save and Retrieve a Vector Database using LangChain, FAISS, and Gemini Embeddings

How to Save and Retrieve a Vector Database using LangChain, FAISS, and Gemini Embeddings Efficient storage and retrieval of vector databases is foundational for building intelligent retrieval-augmented generation (RAG) systems using large language models (LLMs). In this guide, we’ll walk through a professional-grade Python implementation that utilizes LangChain with FAISS and Google Gemini Embeddings to store document embeddings and retrieve similar information. This setup is highly suitable for advanced machine learning (ML) and deep learning (DL) engineers who work with semantic search and retrieval pipelines. Why Vector Databases Matter in LLM Applications Traditional keyword-based search systems fall short when it comes to understanding semantic meaning. Vector databases store high-dimensional embeddings of text data, allowing for approximate nearest-neighbor (ANN) searches based on semantic similarity. These capabilities are critical in applications like: Question Ans...

Building an MCP Agent with UV, Python & mcp-use

Model Context Protocol (MCP) is an open protocol designed to enable AI agents to interact with external tools and data in a standardized way. MCP is composed of three components: server , client , and host . MCP host The MCP host acts as the interface between the user and the agent   (such as Claude Desktop or IDE) and plays the role of connecting to external tools or data through MCP clients and servers. Previously, Anthropic’s Claude Desktop was introduced as a host, but it required a separate desktop app, license, and API key management, leading to dependency on the Claude ecosystem.   mcp-use is an open-source Python/Node package that connects LangChain LLMs (e.g., GPT-4, Claude, Groq) to MCP servers in just six lines of code, eliminating dependencies and supporting multi-server and multi-model setups. MCP Client The MCP client manages the MCP protocol within the host and is responsible for connecting to MCP servers that provide the necessary functions for the ...

RF-DETR: Overcoming the Limitations of DETR in Object Detection

RF-DETR (Region-Focused DETR), proposed in April 2025, is an advanced object detection architecture designed to overcome fundamental drawbacks of the original DETR (DEtection TRansformer) . In this technical article, we explore RF-DETR's contributions, architecture, and how it compares with both DETR and the improved model D-FINE . We also provide experimental benchmarks and discuss its real-world applicability. RF-DETR Architecture diagram for object detection Limitations of DETR DETR revolutionized object detection by leveraging the Transformer architecture, enabling end-to-end learning without anchor boxes or NMS (Non-Maximum Suppression). However, DETR has notable limitations: Slow convergence, requiring heavy data augmentation and long training schedules Degraded performance on low-resolution objects and complex scenes Lack of locality due to global self-attention mechanisms Key Innovations in RF-DETR RF-DETR intr...