UpTrain
UpTrain [github || website || docs] is an open-source platform to evaluate and improve LLM applications. It provides grades for 20+ preconfigured checks (covering language, code, embedding use cases), performs root cause analyses on instances of failure cases and provides guidance for resolving them.
UpTrain Callback Handler​
This notebook showcases the UpTrain callback handler seamlessly integrating into your pipeline, facilitating diverse evaluations. We have chosen a few evaluations that we deemed apt for evaluating the chains. These evaluations run automatically, with results displayed in the output. More details on UpTrain's evaluations can be found here.
Selected retievers from Langchain are highlighted for demonstration:
1. Vanilla RAG:​
RAG plays a crucial role in retrieving context and generating responses. To ensure its performance and response quality, we conduct the following evaluations:
- Context Relevance: Determines if the context extracted from the query is relevant to the response.
- Factual Accuracy: Assesses if the LLM is hallcuinating or providing incorrect information.
- Response Completeness: Checks if the response contains all the information requested by the query.
2. Multi Query Generation:​
MultiQueryRetriever creates multiple variants of a question having a similar meaning to the original question. Given the complexity, we include the previous evaluations and add:
- Multi Query Accuracy: Assures that the multi-queries generated mean the same as the original query.
3. Context Compression and Reranking:​
Re-ranking involves reordering nodes based on relevance to the query and choosing top n nodes. Since the number of nodes can reduce once the re-ranking is complete, we perform the following evaluations:
- Context Reranking: Checks if the order of re-ranked nodes is more relevant to the query than the original order.
- Context Conciseness: Examines whether the reduced number of nodes still provides all the required information.
These evaluations collectively ensure the robustness and effectiveness of the RAG, MultiQueryRetriever, and the Reranking process in the chain.
Install Dependencies​
%pip install -qU langchain langchain_openai langchain-community uptrain faiss-cpu flashrank
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
``````output
[33mWARNING: There was an error checking the latest version of pip.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.
NOTE: that you can also install faiss-gpu
instead of faiss-cpu
if you want to use the GPU enabled version of the library.
Import Libraries​
from getpass import getpass
from langchain.chains import RetrievalQA
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.callbacks.uptrain_callback import UpTrainCallbackHandler
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import (
RecursiveCharacterTextSplitter,
)
Load the documents​
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
Split the document into chunks​
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
chunks = text_splitter.split_documents(documents)
Create the retriever​
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever()
Define the LLM​
llm = ChatOpenAI(temperature=0, model="gpt-4")
Setup​
UpTrain provides you with:
- Dashboards with advanced drill-down and filtering options
- Insights and common topics among failing cases
- Observability and real-time monitoring of production data
- Regression testing via seamless integration with your CI/CD pipelines
You can choose between the following options for evaluating using UpTrain: