Label Studio
Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.
In this guide, you will learn how to connect a LangChain pipeline to Label Studio
to:
- Aggregate all input prompts, conversations, and responses in a single
Label Studio
project. This consolidates all the data in one place for easier labeling and analysis. - Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.
- Evaluate model responses through human feedback.
Label Studio
provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration.
Installation and setupโ
First install latest versions of Label Studio and Label Studio API client:
%pip install --upgrade --quiet langchain label-studio label-studio-sdk langchain-openai langchain-community
Next, run label-studio
on the command line to start the local LabelStudio instance at http://localhost:8080
. See the Label Studio installation guide for more options.
You'll need a token to make API calls.
Open your LabelStudio instance in your browser, go to Account & Settings > Access Token
and copy the key.
Set environment variables with your LabelStudio URL, API key and OpenAI API key:
import os
os.environ["LABEL_STUDIO_URL"] = "<YOUR-LABEL-STUDIO-URL>" # e.g. http://localhost:8080
os.environ["LABEL_STUDIO_API_KEY"] = "<YOUR-LABEL-STUDIO-API-KEY>"
os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"
Collecting LLMs prompts and responsesโ
The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data.
Create a project that takes human input in text format and outputs an editable LLM response in a text area:
<View>
<Style>
.prompt-box {
background-color: white;
border-radius: 10px;
box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);
padding: 20px;
}
</Style>
<View className="root">
<View className="prompt-box">
<Text name="prompt" value="$prompt"/>
</View>
<TextArea name="response" toName="prompt"
maxSubmissions="1" editable="true"
required="true"/>
</View>
<Header value="Rate the response:"/>
<Rating name="rating" toName="prompt"/>
</View>
- To create a project in Label Studio, click on the "Create" button.
- Enter a name for your project in the "Project Name" field, such as
My Project
. - Navigate to
Labeling Setup > Custom Template
and paste the XML configuration provided above.
You can collect input LLM prompts and output responses in a LabelStudio project, connecting it via LabelStudioCallbackHandler
:
from langchain_community.callbacks.labelstudio_callback import (
LabelStudioCallbackHandler,
)
from langchain_openai import OpenAI
llm = OpenAI(
temperature=0, callbacks=[LabelStudioCallbackHandler(project_name="My Project")]
)
print(llm.invoke("Tell me a joke"))
In the Label Studio, open My Project
. You will see the prompts, responses, and metadata like the model name.