Martin C. Arnold's Homepage

(3.12.7)

Introduction

In this post, I’ll document how I built Metrica, an AI-powered student advisor chatbot for the MSc Econometrics program. The chatbot uses Retrieval-Augmented Generation (RAG) to provide accurate information by combining the power of a large language model (LLM) with reliable source material such as FAQs and terms of study. Metrica is designed to assist students with common queries about the program and provide guidance based on official documentation.

Here’s an overview of the key features of the chatbot:

RAG Pipeline: We set up a RAG pipeline to combine the capabilities of an LLM with a retriever for accurate responses. The LLM provides the language understanding and generation capabilities, while the retriever ensures that the responses are grounded in the provided documents. The LLM used is a variant of Meta’s Llama 3.3 model with 70 billion parameters, which is known for its versatility and performance in generating human-like text. I access this model using via the Groq API (llama-3.3-70b-versatile)¹
Embeddings: I use OpenAI’s text-embedding-3-large model to create high-quality embeddings for the documents.²
Persistent Vector Storage: A ChromaDB stores the document embeddings for efficient retrieval. The vector store allows the bot to quickly find relevant information from the documents and is persisted to local disk for future use.
Context-Aware Conversations: Using LangChain components, I implement some prompt engineering and ensure that Metrica maintains context from previous messages to provide relevant responses.
Interface: The bot features a clean, professional interface for user interactions which is built using Gradio.

Let’s dive into the code and see how we can build this student advisor chatbot using Python.

Setting Up the Environment

We begin by importing the required Python libraries. The os module provides functions for interacting with the operating system, including file operations and path manipulations.

import os, time

Next, we import the following LangChain components:

Chat and embedding models (ChatGroq, OpenAIEmbeddings)
Document handling (Chroma for vector store, TextLoader for loading text files)
Chain components for document processing and retrieval
Message handling and prompt templates
Text splitting utilities for processing the documents

from langchain_groq import ChatGroq
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_text_splitters import RecursiveCharacterTextSplitter, CharacterTextSplitter

To avoid hard-coding sensitive info into the script, the required API keys are stored in local environment variables (.env file).

from dotenv import load_dotenv
# load .env file
load_dotenv()

# read API keys
groq_api_key = os.environ.get("groq_api_key")
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

Remember to add your own API keys here!

Document Processing

Our chatbot needs to understand two types of information with different structures: FAQs and the Terms of Study. I’ve parsed the relevant documents into .txt files. You may download them below:

We’ll use different text splitters for each file to maintain the documents’ logical structure.

FAQ Splitter: CharacterTextSplitter simply breaks FAQ text into chunks using paragraph breaks (\n\n) as separators. This is straightforward due the convenient formatting of the .txt file: Answers to FAQ are simple bullets in markdown format.
Legal Document Splitter: The RecursiveCharacterTextSplitter handles the structured nature of legal documents. It creates chunks of up to 5000 characters with a 700-character overlap to preserve context between sections. Splits are guided by key structural markers like section symbols (§) and paragraph breaks (\n\n).³

# FAQ text splitter - simply splits on paragraph breaks
text_splitter_faq = CharacterTextSplitter( 
    separator="\n\n",
)

# legal document text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=5000,
    chunk_overlap=700,
    separators=["§", "\n\n"],
)

Embeddings and the Vector Store

I use OpenAI’s text-embedding-3-large model to create embeddings and store them in a Chroma vector database for efficient retrieval of relevant document sections based on user queries.

After creating the embeddings, the logic checks if the chroma directory exists.⁴ If not, the code processes FAQ and terms of study documents by splitting them into chunks. Then it creates the vector store using Chroma, embedds the documents in a collection MScEconometrics and saves persists the DB in the chroma directory. If the directory exists, we simply load the existing vector store.

Ensure the documents are in the correct location before running the next code chunk. Your project directory should have the following structure:

project/
├── your_python_script.py
└── assets/
    ├── FAQ.txt
    ├── terms of study.txt

# create embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# vector store
if not os.path.isdir("chroma"):
    # load and process documents
    chunked_documents = []
    
    # process FAQ file
    loader = TextLoader("assets/FAQ.txt")
    docs = loader.load()
    docs = text_splitter_faq.split_documents(docs)
    chunked_documents.extend(docs)

    # process terms of study file
    loader = TextLoader("assets/terms of study.txt")
    po = loader.load()
    po = text_splitter.split_documents(po)
    chunked_documents.extend(po)

    # create and persist the vector store
    vectorstore = Chroma.from_documents(
        documents=chunked_documents,
        embedding=embeddings,
        persist_directory="chroma",
        collection_name="MScEconometrics",
    )
else:
    # load existing vector store
    vectorstore = Chroma(
        collection_name="MScEconometrics", 
        persist_directory="chroma", 
        embedding_function=embeddings
    )

Building the RAG Pipeline

My RAG pipeline consists of several key components working together:

A history-aware retriever that understands context from previous conversations
A question-answering chain that combines retrieved documents with the LLM’s capabilities
Carefully crafted prompts that guide the model’s behavior

We first set up the API connection to the LLM.

# set up the language model
llm = ChatGroq(
    temperature=0,
    model_name="llama-3.3-70b-versatile",
    api_key=groq_api_key,
)

Next we implement context-aware question reformulation. We use a system prompt to instruct the model to reformulate user questions based on the chat history, ensuring the questions are standalone and understandable without prior context. The ChatPromptTemplate is used to structure the prompt with placeholders for the chat history and user input.

# context-aware question reformulation
contextualize_q_system_prompt = """
Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is.
"""

#
contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

Now we set up the document retriever and a history-aware retriever. The vectorstore.as_retriever() method creates a retriever from the vector store. The create_history_aware_retriever function combines model llm, the retriever, and the context-aware question reformulation prompt to create a retriever that can handle context from the chat history.

# create the retriever
retriever = vectorstore.as_retriever()

history_aware_retriever = create_history_aware_retriever(
    llm, 
    retriever, 
    contextualize_q_prompt
)

In the system pompt, we instructs Metrica to provide information based on FAQs and terms of study, ensuring clear and detailed answers. It emphasizes summarizing relevant passages without copying and citing specific references when possible. The ChatPromptTemplate is used to structure the prompt with placeholders for the chat history and user input.

# QA system prompt
qa_system_prompt = """
You are Metrica, the student advisor for the MSc in Econometrics program. You provide information on the program based on FAQs, and terms of study. Both are provided in the context in markdown format. 

Only answer questions related to the MSc Econometrics program. Give clear, detailed answers. If unsure, say you do not know.

Refer to FAQs and terms of study for context. Summarize relevant passages without copying. 

The terms of study are structured: 
Articles are indicated by a §.
Numbered paragraphs indicated by (1), (2), etc.
Sub-paragraphs are alphabetically sorted, indicated by (a), (b), etc. 
Entries may contain numbered and unnumbered lists.

State requirements clearly. Cite specific references in the terms of study, e.g., § 3 (1) (a), if possible. Do not make up answers about components of the terms of study that do not exist.

For assistance, refer to TU Dortmund University facilities.

A note on Grades: '1.0' is the best grade. So the grade '2.0' is better than the grade '2.7'. For example '2.0 > 2.7'.
{context}
"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

The code below creates the final processing chains for the question-answer system. create_stuff_documents_chain() combines the LLM and the QA prompt to form a chain that processes and answers questions based on the provided documents. create_retrieval_chain() then combines the history-aware retriever and the question-answer chain to create a robust retrieval-augmented generation (RAG) chain. This RAG chain ensures that the we may retrieve relevant context from the documents and generate accurate, context-aware responses.

# the final chain
question_answer_chain = create_stuff_documents_chain(
    llm, 
    qa_prompt
)

rag_chain = create_retrieval_chain(
    history_aware_retriever, 
    question_answer_chain
)

With our RAG pipeline set up, we can now interact with the model using rag_chain.invoke(). This method accepts a dictionary with two keys:

input: The user’s question
chat_history: A list of previous interactions (empty when starting a new conversation)

Let’s test the chain with a sample question about admission requirements for the Masters Exam:

# test the chain
the_answer = rag_chain.invoke({
    "input": "What are the requirements for admission to the Masters Exam?", 
    "chat_history": []
})

print(the_answer["answer"])

According to § 17 (1), to be admitted to the Master's Exam, a student must be 
enrolled at the time of the exam or at the time she applies for the exam, 
specifically in the Master's degree program in Econometrics at TU Dortmund 
University and admitted as a guest student at the University of Duisburg-Essen 
and at the Ruhr-University Bochum. 

Additionally, admission to the exam must not be denied due to any of the conditions 
listed in § 17 
(2), which include: 
(a) having definitively failed to pass an exam required by the terms of study, 
(b) having a pending official objection against the results of an exam, or 
(c) having already been admitted to the examination process of the same degree 
program at another university or in a significantly similar degree program.

We see that the model does a nice job summarizing the relevant passages of the Terms of Study, pointing the student to § 17.

Creating the User Interface

Of course, interacting with the a Python CLI is not user friendly, so we’ll create a Gradio interface. This provides a clean, web-based chat interface for users to interact with Metrica.

Next, we’ll create a user-friendly chat interface using Gradio. The interface includes:

A chat window with scrollable message history
A text input field for user queries
Send and Clear buttons for message control
Custom styling for better visual appearance
Message streaming for a more natural chat experience
A custom avatar for the bot

The implementation uses Gradio’s Blocks API. Most of the code is simply adapted from the Gradio documentation.

import gradio as gr

# gradio blocks interface
with gr.Blocks(theme=gr.themes.Soft(), css="""
    .avatar-container {
        height: 75px; 
        width: 75px;
    }
    .gradio-chatbot .gradio-scroll-container {
        overflow-y: scroll;
        height: 550px;
    }
    """) as cw:
    chatbot = gr.Chatbot([
        (
            None,
            "Hi, I'm Metrica. How can I help you?\n\n**Please note: None of the information given here is legally binding, and I may make mistakes. You should check the official documents in any case.**",
        )
        ],
        placeholder="Please note: none of the information given here is legally binding.",
        avatar_images=[None, "assets/metrica.png"],
        label="Metrica",
    )
    msg = gr.Textbox(label="Ask right away!")
    with gr.Row():
        with gr.Column():
            send_button = gr.Button("Send")
            clear = gr.ClearButton([msg, chatbot])

    # user interactions and responses
    chat_history = []

    def user(user_message, history):
        return "", history + [[user_message, None]]

    def qa_response(history):
        the_answer = rag_chain.invoke({
            "input": history[-1][0], 
            "chat_history": chat_history
        })
        chat_history.extend([
            HumanMessage(content=history[-1][0]), 
            the_answer["answer"]
        ])

        # response streaming
        bot_message = the_answer["answer"]
        history[-1][1] = ""
        for character in bot_message:
            history[-1][1] += character
            time.sleep(0.01)
            yield history

    # event handlers
    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        qa_response, chatbot, chatbot
    )
    send_button.click(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        qa_response, chatbot, chatbot
    )
    clear.click(lambda: None, None, chatbot, queue=False)

# launch the interface
cw.launch()

Deployment

I deployed the application on Heroku. The deployment process involves:

Creating a project repository with all required files (app code in app.py)
Adding a file Procfile (yes, a file named just Procfile) to specify the web process
```
# Procfile in my example
web: source setup.sh && python app.py
```

Including setup instructions for environment variables in a shell script

# setup.sh
export GRADIO_SERVER_NAME=0.0.0.0
export GRADIO_SERVER_PORT="$PORT"

Pushing everything via the Heroku CLI

Of course you’ll need a Heroku account to deploy your own version.⁵

You can access the app here or try the live demo below.

Due to Groq’s free tier token usage limitations (currently 6000 tokens per minute for the Llama 3.3 model), the interface may become temporarily unresponsive when this token limit is reached. In such cases, you’ll need to wait one minute for the token quota to reset.

The app may take a few moments to load initially, as it runs on an eco dyno instance @ Heroku, which puts the application to sleep after periods of inactivity.

Another consideration is OpenAI API usage costs (on a token basis), which occur whenever queries are made: Each query requires running text through their embedding model before the RAG retrieval and LLM operations can proceed. These costs are fairly low ($0.02 per 1M tokens), but I may restrict access to the app if… things escalade 🙂.

Wrap-up

Metrica showcases how modern AI tools can create a helpful, practical advisor for students. The RAG approach ensures responses are grounded in official documentation while maintaining a conversational interface that students find approachable and easy to use. The app can be easily embedded into program websites, course management systems, or student portals.

Some useful enhancements include:

Implementing feedback collection
Expanding the knowledge base to include more program materials, automate material collection from online resources, and implement automatic rebuilds of the vector store
Adding better references/links to relevant documents

Accessing the Groq API requires a free API key. ↩︎
Using OpenAI embeddings requires an API key as well and some credits for usage. ↩︎
This chunk size (and overlap) is quite large. I experienced the LLM having a hard time to identify relevant information in smaller chunks, which is probably due to the legal text being heavily structured and referential in nature, requiring more context to understand properly. ↩︎
This control flow primarily aims to save tokens at OpenAI, avoiding unnecessary costs for embedding unchanged documents via the API and needlessly recreating the vector store from scratch each time the script runs. ↩︎
Heroku offers both free and paid tiers for hosting applications. The free tier should be enough for testing. ↩︎

Introduction#

Setting Up the Environment#

Document Processing#

Embeddings and the Vector Store#

Building the RAG Pipeline#

Creating the User Interface#

Deployment#

Wrap-up#