This will help you get started with CloudflareWorkersAI chat models. For detailed documentation of all ChatCloudflareWorkersAI features and configurations head to the API reference.

Overview

Integration details

ClassPackageLocalSerializableJS supportDownloadsVersion
ChatCloudflareWorkersAIlangchain-cloudflarePyPI - DownloadsPyPI - Version

Model features

Tool callingStructured outputJSON modeImage inputAudio inputVideo inputToken-level streamingNative asyncToken usageLogprobs

Setup

To access CloudflareWorkersAI models you’ll need to create a/an CloudflareWorkersAI account, get an API key, and install the langchain-cloudflare integration package.

Credentials

Head to www.cloudflare.com/developer-platform/products/workers-ai/ to sign up to CloudflareWorkersAI and generate an API key. Once you’ve done this set the CF_AI_API_KEY environment variable and the CF_ACCOUNT_ID environment variable:
import getpass
import os

if not os.getenv("CF_AI_API_KEY"):
    os.environ["CF_AI_API_KEY"] = getpass.getpass(
        "Enter your CloudflareWorkersAI API key: "
    )

if not os.getenv("CF_ACCOUNT_ID"):
    os.environ["CF_ACCOUNT_ID"] = getpass.getpass(
        "Enter your CloudflareWorkersAI account ID: "
    )
If you want to get automated tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")

Installation

The LangChain CloudflareWorkersAI integration lives in the langchain-cloudflare package:
%pip install -qU langchain-cloudflare

Instantiation

Now we can instantiate our model object and generate chat completions:
  • Update model instantiation with relevant params.
from langchain_cloudflare.chat_models import ChatCloudflareWorkersAI

llm = ChatCloudflareWorkersAI(
    model="@cf/meta/llama-3.3-70b-instruct-fp8-fast",
    temperature=0,
    max_tokens=1024,
    # other params...
)

Invocation

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg
AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 37, 'completion_tokens': 9, 'total_tokens': 46}, 'model_name': '@cf/meta/llama-3.3-70b-instruct-fp8-fast'}, id='run-995d1970-b6be-49f3-99ae-af4cdba02304-0', usage_metadata={'input_tokens': 37, 'output_tokens': 9, 'total_tokens': 46})
print(ai_msg.content)
J'adore la programmation.

Chaining

We can chain our model with a prompt template like so:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate(
    [
        (
            "system",
            "You are a helpful assistant that translates {input_language} to {output_language}.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm
chain.invoke(
    {
        "input_language": "English",
        "output_language": "German",
        "input": "I love programming.",
    }
)
AIMessage(content='Ich liebe das Programmieren.', additional_kwargs={}, response_metadata={'token_usage': {'prompt_tokens': 32, 'completion_tokens': 7, 'total_tokens': 39}, 'model_name': '@cf/meta/llama-3.3-70b-instruct-fp8-fast'}, id='run-d1b677bc-194e-4473-90f1-aa65e8e46d50-0', usage_metadata={'input_tokens': 32, 'output_tokens': 7, 'total_tokens': 39})

Structured Outputs

json_schema = {
    "title": "joke",
    "description": "Joke to tell user.",
    "type": "object",
    "properties": {
        "setup": {
            "type": "string",
            "description": "The setup of the joke",
        },
        "punchline": {
            "type": "string",
            "description": "The punchline to the joke",
        },
        "rating": {
            "type": "integer",
            "description": "How funny the joke is, from 1 to 10",
            "default": None,
        },
    },
    "required": ["setup", "punchline"],
}
structured_llm = llm.with_structured_output(json_schema)

structured_llm.invoke("Tell me a joke about cats")
{'setup': 'Why did the cat join a band?',
 'punchline': 'Because it wanted to be the purr-cussionist',
 'rating': '8'}

Bind tools

from typing import List

from langchain_core.tools import tool


@tool
def validate_user(user_id: int, addresses: List[str]) -> bool:
    """Validate user using historical addresses.

    Args:
        user_id (int): the user ID.
        addresses (List[str]): Previous addresses as a list of strings.
    """
    return True


llm_with_tools = llm.bind_tools([validate_user])

result = llm_with_tools.invoke(
    "Could you validate user 123? They previously lived at "
    "123 Fake St in Boston MA and 234 Pretend Boulevard in "
    "Houston TX."
)
result.tool_calls
[{'name': 'validate_user',
  'args': {'user_id': '123',
   'addresses': '["123 Fake St in Boston MA", "234 Pretend Boulevard in Houston TX"]'},
  'id': '31ec7d6a-9ce5-471b-be64-8ea0492d1387',
  'type': 'tool_call'}]

API reference

developers.cloudflare.com/workers-ai/ developers.cloudflare.com/agents/