Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.In this guide, you will learn how to connect a LangChain pipeline to
Label Studio
to:
- Aggregate all input prompts, conversations, and responses in a single
Label Studio
project. This consolidates all the data in one place for easier labeling and analysis. - Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.
- Evaluate model responses through human feedback.
Label Studio
provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration.
Installation and setup
First install latest versions of Label Studio and Label Studio API client:label-studio
on the command line to start the local LabelStudio instance at http://localhost:8080
. See the Label Studio installation guide for more options.
You’ll need a token to make API calls.
Open your LabelStudio instance in your browser, go to Account & Settings > Access Token
and copy the key.
Set environment variables with your LabelStudio URL, API key and OpenAI API key:
Collecting LLMs prompts and responses
The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data. Create a project that takes human input in text format and outputs an editable LLM response in a text area:- To create a project in Label Studio, click on the “Create” button.
- Enter a name for your project in the “Project Name” field, such as
My Project
. - Navigate to
Labeling Setup > Custom Template
and paste the XML configuration provided above.
LabelStudioCallbackHandler
:
My Project
. You will see the prompts, responses, and metadata like the model name.
Collecting Chat model Dialogues
You can also track and display full chat dialogues in LabelStudio, with the ability to rate and modify the last response:- Open Label Studio and click on the “Create” button.
- Enter a name for your project in the “Project Name” field, such as
New Project with Chat
. - Navigate to Labeling Setup > Custom Template and paste the following XML configuration:
Custom Labeling Configuration
You can modify the default labeling configuration in LabelStudio to add more target labels like response sentiment, relevance, and many other types annotator’s feedback. New labeling configuration can be added from UI: go toSettings > Labeling Interface
and set up a custom configuration with additional tags like Choices
for sentiment or Rating
for relevance. Keep in mind that TextArea
tag should be presented in any configuration to display the LLM responses.
Alternatively, you can specify the labeling configuration on the initial call before project creation:
Other parameters
TheLabelStudioCallbackHandler
accepts several optional parameters:
- api_key - Label Studio API key. Overrides environmental variable
LABEL_STUDIO_API_KEY
. - url - Label Studio URL. Overrides
LABEL_STUDIO_URL
, defaulthttp://localhost:8080
. - project_id - Existing Label Studio project ID. Overrides
LABEL_STUDIO_PROJECT_ID
. Stores data in this project. - project_name - Project name if project ID not specified. Creates a new project. Default is
"LangChain-%Y-%m-%d"
formatted with the current date. - project_config - custom labeling configuration
- mode: use this shortcut to create target configuration from scratch:
"prompt"
- Single prompt, single response. Default."chat"
- Multi-turn chat mode.