Overview
This guide covers how to use the LangChainChatRunPod
class to interact with chat models hosted on RunPod Serverless.
Setup
-
Install the package:
- Deploy a Chat Model Endpoint: Follow the setup steps in the RunPod Provider Guide to deploy a compatible chat model endpoint on RunPod Serverless and get its Endpoint ID.
-
Set Environment Variables: Make sure
RUNPOD_API_KEY
andRUNPOD_ENDPOINT_ID
(or a specificRUNPOD_CHAT_ENDPOINT_ID
) are set.
Instantiation
Initialize theChatRunPod
class. You can pass model-specific parameters via model_kwargs
and configure polling behavior.
Invocation
Use the standard LangChain.invoke()
and .ainvoke()
methods to call the model. Streaming is also supported via .stream()
and .astream()
(simulated by polling the RunPod /stream
endpoint).
Chaining
The chat model integrates seamlessly with LangChain Expression Language (LCEL) chains.Model Features (Endpoint Dependent)
The availability of advanced features depends heavily on the specific implementation of your RunPod endpoint handler. TheChatRunPod
integration provides the basic framework, but the handler must support the underlying functionality.
Feature | Integration Support | Endpoint Dependent? | Notes |
---|---|---|---|
Tool calling | ❌ | ✅ | Requires handler to process tool definitions and return tool calls (e.g., OpenAI format). Integration needs parsing logic. |
Structured output | ❌ | ✅ | Requires handler support for forcing structured output (JSON mode, function calling). Integration needs parsing logic. |
JSON mode | ❌ | ✅ | Requires handler to accept a json_mode parameter (or similar) and guarantee JSON output. |
Image input | ❌ | ✅ | Requires multimodal handler accepting image data (e.g., base64). Integration does not support multimodal messages. |
Audio input | ❌ | ✅ | Requires handler accepting audio data. Integration does not support audio messages. |
Video input | ❌ | ✅ | Requires handler accepting video data. Integration does not support video messages. |
Token-level streaming | ✅ (Simulated) | ✅ | Polls /stream . Requires handler to populate stream list in status response with token chunks (e.g., [{"output": "token"}] ). True low-latency streaming not built-in. |
Native async | ✅ | ✅ | Core ainvoke /astream implemented. Relies on endpoint handler performance. |
Token usage | ❌ | ✅ | Requires handler to return prompt_tokens , completion_tokens in the final response. Integration currently does not parse this. |
Logprobs | ❌ | ✅ | Requires handler to return log probabilities. Integration currently does not parse this. |
API reference
For detailed documentation of theChatRunPod
class, parameters, and methods, refer to the source code or the generated API reference (if available).
Link to source code: https://github.com/runpod/langchain-runpod/blob/main/langchain_runpod/chat_models.py