HuggingFacePipeline
class. To deploy a model with OpenVINO, you can specify the backend="openvino"
parameter to trigger OpenVINO as backend inference framework.
To use, you should have the optimum-intel
with OpenVINO Accelerator python package installed.
Model Loading
Models can be loaded by specifying the model parameters using thefrom_model_id
method.
If you have an Intel GPU, you can specify model_kwargs={"device": "GPU"}
to run inference on it.
optimum-intel
pipeline directly
Create Chain
With the model loaded into memory, you can compose it with a prompt to form a chain.skip_prompt=True
with LLM.
Inference with local OpenVINO model
It is possible to export your model to the OpenVINO IR format with the CLI, and load the model from local folder.--weight-format
:
ov_config
as follows:
Streaming
You can usestream
method to get a streaming of LLM output,