LLM
s with the Llama2Chat
wrapper to support the Llama-2 chat prompt format. Several LLM
implementations in LangChain can be used as interface to Llama-2 chat models. These include ChatHuggingFace, LlamaCpp, GPT4All, …, to mention a few examples.
Llama2Chat
is a generic wrapper that implements BaseChatModel
and can therefore be used in applications as chat model. Llama2Chat
converts a list of Messages into the required chat prompt format and forwards the formatted prompt as str
to the wrapped LLM
.
prompt_template
:
Chat with Llama-2 via HuggingFaceTextGenInference
LLM
A HuggingFaceTextGenInference LLM encapsulates access to a text-generation-inference server. In the following example, the inference server serves a meta-llama/Llama-2-13b-chat-hf model. It can be started locally with:
--num_shard
value to the number of GPUs available. The HF_API_TOKEN
environment variable holds the Hugging Face API token.
HuggingFaceTextGenInference
instance that connects to the local inference server and wrap it into Llama2Chat
.
model
together with prompt_template
and conversation memory
in an LLMChain
.
Chat with Llama-2 via LlamaCPP
LLM
For using a Llama-2 chat model with a LlamaCPP LMM
, install the llama-cpp-python
library using these installation instructions. The following example uses a quantized llama-2-7b-chat.Q4_0.gguf model stored locally at ~/Models/llama-2-7b-chat.Q4_0.gguf
.
After creating a LlamaCpp
instance, the llm
is again wrapped into Llama2Chat