llama.cpp python library is a simple Python bindings for@ggerganov
llama.cpp. This package provides:
- Low-level access to C API via ctypes interface.
- High-level Python API for text completion
OpenAI
-like APILangChain
compatibilityLlamaIndex
compatibility- OpenAI compatible web server
- Local Copilot replacement
- Function Calling support
- Vision API support
- Multiple Models
Overview
Integration details
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
ChatLlamaCpp | langchain-community | ✅ | ❌ | ❌ |
Model features
Tool calling | Structured output | JSON mode | Image input | Audio input | Video input | Token-level streaming | Native async | Token usage | Logprobs |
---|---|---|---|---|---|---|---|---|---|
✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ |
Setup
To get started and use all the features shown below, we recommend using a model that has been fine-tuned for tool-calling. We will use Hermes-2-Pro-Llama-3-8B-GGUF from NousResearch.Hermes 2 Pro is an upgraded version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house. This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function CallingSee our guides on local models to go deeper:
Installation
The LangChain LlamaCpp integration lives in thelangchain-community
and llama-cpp-python
packages:
Instantiation
Now we can instantiate our model object and generate chat completions:Invocation
Chaining
We can chain our model with a prompt template like so:Tool calling
Firstly, it works mostly the same as OpenAI Function Calling OpenAI has a tool calling (we use “tool calling” and “function calling” interchangeably here) API that lets you describe tools and their arguments, and have the model return a JSON object with a tool to invoke and the inputs to that tool. tool-calling is extremely useful for building tool-using chains and agents, and for getting structured outputs from models more generally. WithChatLlamaCpp.bind_tools
, we can easily pass in Pydantic classes, dict schemas, LangChain tools, or even functions as tools to the model. Under the hood, these are converted to an OpenAI tool schema, which looks like:
{"type": "function", "function": {"name": <<tool_name>>}}.