Setup
- Download a llamafile for the model you’d like to use. You can find many models in llamafile format on HuggingFace. In this guide, we will download a small one,
TinyLlama-1.1B-Chat-v1.0.Q5_K_M
. Note: if you don’t havewget
, you can just download the model via this link.
- Make the llamafile executable. First, if you haven’t done so already, open a terminal. If you’re using MacOS, Linux, or BSD, you’ll need to grant permission for your computer to execute this new file using
chmod
(see below). If you’re on Windows, rename the file by adding “.exe” to the end (model file should be namedTinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile.exe
).
- Run the llamafile in “server mode”:
Usage
.stream(...)
method: