The Hugging Face Hub is a platform with over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.The
Hugging Face Hub
also offers various endpoints to build ML applications.
This example showcases how to connect to the different Endpoints types.
In particular, text generation inference is powered by Text Generation Inference: a custom-built Rust, Python and gRPC server for blazing-faset text generation inference.
Installation and Setup
To use, you should have thehuggingface_hub
python package installed.
Prepare Examples
Examples
Here is an example of how you can accessHuggingFaceEndpoint
integration of the serverless Inference Providers API.
Dedicated Endpoint
The free serverless API lets you implement solutions and iterate in no time, but it may be rate limited for heavy use cases, since the loads are shared with other requests. For enterprise workloads, the best is to use Inference Endpoints - Dedicated. This gives access to a fully managed infrastructure that offer more flexibility and speed. These resoucres come with continuous support and uptime guarantees, as well as options like AutoScalingStreaming
HuggingFaceEndpoint
class can be used with a local HuggingFace TGI instance serving the LLM. Check out the TGI repository for details on various hardware (GPU, TPU, Gaudi…) support.