Skip to main content

Production RAG Architecture: LangChain.js & Vector Search

By LearnWebCraft Team10 min read
RAG ArchitectureVector SearchLLM Pipelines

If you’ve been hanging around the web development water cooler lately, you’ve probably heard the term RAG thrown around more often than "JavaScript framework fatigue." It stands for Retrieval Augmented Generation, and honestly, it’s one of the most exciting shifts in how we build software today.

We all know large language models (LLMs) like ChatGPT are incredibly smart. But they have a massive flaw: they are essentially frozen in time. They only know what they were trained on. If you ask a standard model about your company’s internal documentation from last week, or the specific details of a project you just started, it will either hallucinate a confident lie or simply shrug and say, "I don't know."

This is where building a RAG chatbot with LangChain.js comes into play. RAG bridges the gap between the LLM's general intelligence and your specific, private data. It’s like giving the AI a textbook before the exam—specifically, your textbook.

In this tutorial, I’m going to walk you through building a fully functional RAG chatbot using LangChain.js. We aren't just going to copy-paste code; we’re going to dig into the concepts of embeddings, vector stores, and chains so you actually understand the magic happening under the hood.

By the end of this guide, you’ll have a bot that can answer questions based on your own custom data. If you are looking to take this further and build autonomous agents, check out our guide on Building Agentic AI with LangGraph. Let's dive in.

1. Introduction to RAG and LangChain.js

Before we open our terminal, let's take a quick second to demystify what we are actually building.

What is RAG?

Imagine you are taking a difficult history test.

  1. Standard LLM: You have to answer purely from memory. If you haven't studied a specific obscure event, you might panic and make something up that sounds plausible just to fill the page.
  2. RAG: You are allowed to run to the library (Retrieval), grab a specific book that contains the answer, open it to the right page, and then write your answer based on that text (Generation).

In technical terms, RAG is a pipeline. When a user asks a question:

  1. Retrieve: The system searches your database for relevant chunks of text.
  2. Augment: It stuffs those text chunks into the prompt sent to the AI.
  3. Generate: The AI answers the user's question using the provided context.

Why LangChain.js?

You could write all the logic to call OpenAI APIs, format prompts, and search databases manually. But that gets messy, fast. LangChain.js is a framework that acts as the glue for these components. It provides pre-built interfaces for:

  • Loaders: To read PDFs, text files, or websites.
  • Splitters: To chop text into manageable pieces.
  • Models: To interface with OpenAI, Anthropic, or local models.
  • Vector Stores: To save and search your data efficiently.

It standardizes the chaotic world of AI development into clean, chainable JavaScript methods.

2. Prerequisites: Setting Up Your Development Environment

To follow along without hitting roadblocks, you’ll need a few things ready. I'm assuming you are comfortable with JavaScript or TypeScript and have a basic understanding of how Node.js works.

What You Need

  • Node.js: Version 18 or higher is recommended.
  • A Package Manager: npm, yarn, or pnpm. I'll be using npm in the examples.
  • OpenAI API Key: We will use OpenAI's models for embeddings and chat. You'll need an account and a valid API key. (Yes, this costs money, but usually pennies for a tutorial like this).
  • TypeScript: I highly recommend using TypeScript for AI projects because the types help you navigate the complex objects LangChain returns.

Recommendation: Create a dedicated folder for this project. Keep your workspace clean.

3. Step 1: Project Setup and Dependencies

Let’s get our hands dirty. Open your terminal and create a new directory for your chatbot.

mkdir langchain-rag-bot
cd langchain-rag-bot
npm init -y

Now, we need to install the stars of the show. We need the core LangChain library, the OpenAI integration, and a few utility packages.

npm install langchain @langchain/openai @langchain/core dotenv

If you are using TypeScript (which you should!), let's initialize that as well.

npm install -D typescript @types/node ts-node
npx tsc --init

Update your tsconfig.json to ensure you're targeting a modern ECMAScript version, as LangChain uses modern features.

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "commonjs",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true
  }
}

Environment Variables

Security first! Never hardcode your API keys. Create a .env file in your root directory:

OPENAI_API_KEY=sk-proj-your-actual-api-key-here

Now create a file named index.ts. This will be our playground. To verify everything is working, add this quick check:

import 'dotenv/config';
import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-3.5-turbo",
  temperature: 0
});

async function testConnection() {
  const response = await model.invoke("Hello, are you ready to build a RAG bot?");
  console.log(response.content);
}

testConnection();

Run it with npx ts-node index.ts. If you see a friendly response from the AI, we are green for launch.

4. Step 2: Loading and Processing Data for Retrieval

A RAG bot is only as good as the data you feed it. For this tutorial, we need some "knowledge."

Create a text file named knowledge.txt in your project root. Fill it with some specific, fictional information that GPT-3.5 definitely wouldn't know.

knowledge.txt:

The "Cosmic Muffin" is a fictional pastry invented by the bakery "Star Gazer's Delight" in 2045.
It is made with moon dust flour and a core of liquid nebulas.
The bakery is located on the dark side of the moon, specifically at Crater 42.
The head baker is an alien named 'Zorgon' who insists on kneading dough with telekinesis.
To buy a Cosmic Muffin, one must pay in star shards, not credits.

If you ask generic ChatGPT about "Cosmic Muffins," it will hallucinate. Our bot will know the truth.

Load and Split

LLMs have a context window (a limit on how much text they can read at once). Even with large windows, sending a 500-page PDF for every query is slow and expensive. We need to chunk our data.

In index.ts, let's import the loader and splitter.

import { TextLoader } from "langchain/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";

async function loadAndSplitData() {
  // 1. Load the data
  const loader = new TextLoader("./knowledge.txt");
  const docs = await loader.load();

  // 2. Split the data into chunks
  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 500, // Characters per chunk
    chunkOverlap: 50, // Overlap to maintain context between chunks
  });

  const splitDocs = await splitter.splitDocuments(docs);
  
  console.log(`Loaded ${docs.length} document, split into ${splitDocs.length} chunks.`);
  return splitDocs;
}

Why Recursive Splitting? You might wonder why we don't just split by newlines. RecursiveCharacterTextSplitter is smarter. It tries to split by paragraphs first, then sentences, then words. This keeps semantically related text together, ensuring we don't cut a sentence in half, which would confuse the AI.

5. Step 3: Creating a Vector Store with Embeddings

This is the part that usually confuses beginners, so let's break it down properly.

How does the computer know which chunk of text is relevant to a user's question? It doesn't do a keyword search (Ctrl+F). It uses Embeddings. You can read more about how these work in the OpenAI Embeddings documentation.

An embedding turns text into a long list of numbers (a vector). Text with similar meanings will have similar numbers.

  • "The dog chased the cat"
  • "The canine pursued the feline"

These two sentences look totally different in text, but their vector representations will be very close mathematically.

The Vector Store

We need a database designed to store these number lists and search them quickly. For production, you'd use Pinecone, Supabase (pgvector), or Chroma. We have a detailed comparison of these in our PostgreSQL vs Vector DBs guide. For this tutorial, we will use an in-memory vector store provided by LangChain.

Update your imports and create the store:

import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";

// ... inside your main function logic

async function createVectorStore(docs: any[]) {
  // Create embeddings generator
  const embeddings = new OpenAIEmbeddings();

  // Ingest documents into the vector store
  const vectorStore = await MemoryVectorStore.fromDocuments(
    docs,
    embeddings
  );

  return vectorStore;
}

When MemoryVectorStore.fromDocuments runs, it:

  1. Takes your text chunks.
  2. Sends them to OpenAI's embedding API.
  3. Receives the vectors back.
  4. Stores them in memory, ready for searching.

6. Step 4: Building the RAG Chain

Now we have the ingredients: the model (LLM) and the knowledge base (Vector Store). We need to chain them together.

In modern LangChain (0.2+), we use createRetrievalChain. This abstraction handles the complex logic of:

  1. Taking the user's question.
  2. (Optional) Rephrasing the question if it's part of a conversation history.
  3. Fetching relevant documents from the vector store.
  4. Combining the documents and the question into a prompt.
  5. Sending it to the LLM.

Let's build the chain.

import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { ChatPromptTemplate } from "@langchain/core/prompts";

async function initRAG() {
  // 1. Load and process data
  const docs = await loadAndSplitData();
  
  // 2. Create vector store
  const vectorStore = await createVectorStore(docs);
  
  // 3. Create the retriever
  // This turns the vector store into something the chain can "query"
  const retriever = vectorStore.asRetriever({
    k: 2, // Retrieve the top 2 most relevant chunks
  });

  // 4. Create the LLM
  const model = new ChatOpenAI({
    model: "gpt-3.5-turbo",
    temperature: 0.7,
  });

  // 5. Create the prompt template
  // This instructs the LLM how to use the context
  const prompt = ChatPromptTemplate.fromTemplate(`
    Answer the user's question based ONLY on the following context. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    
    Context:
    {context}
    
    Question:
    {input}
  `);

  // 6. Create the "Stuff" chain
  // This chains "stuffs" the retrieved documents into the prompt
  const combineDocsChain = await createStuffDocumentsChain({
    llm: model,
    prompt,
  });

  // 7. Create the final retrieval chain
  const chain = await createRetrievalChain({
    retriever,
    combineDocsChain,
  });

  return chain;
}

Understanding createStuffDocumentsChain: This sounds like a funny name, but it describes exactly what it does. It takes a list of documents and "stuffs" them all into the {context} variable in your prompt. It's simple and effective for small-to-medium amounts of data.

7. Step 5: Implementing the Chatbot Interface

A RAG bot isn't much fun if you can't talk to it. Let’s build a simple command-line interface (CLI) loop so we can chat with our creation in real-time.

We'll use Node's built-in readline module to handle input and output.

Add this logic to your index.ts:

import * as readline from 'readline';

async function startChat() {
  const chain = await initRAG();

  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  console.log("🤖 Bot is ready! Ask questions about the Cosmic Muffin (or type 'exit' to quit).");

Related Articles