Airbyte is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.This covers how to load any source from Airbyte into LangChain documents
Installation
In order to useAirbyteLoader
you need to install the langchain-airbyte
integration package.
airbyte
library does not support Pydantic v2.
Please downgrade to Pydantic v1 to use this package.
Note: This package also currently requires Python 3.10+.
Loading Documents
By default, theAirbyteLoader
will load any structured data from a stream and output yaml-formatted documents.
Lazy Loading Documents
One of the powerful features ofAirbyteLoader
is its ability to load large documents from upstream sources. When working with large datasets, the default .load()
behavior can be slow and memory-intensive. To avoid this, you can use the .lazy_load()
method to load documents in a more memory-efficient manner.
.alazy_load()
:
Configuration
AirbyteLoader
can be configured with the following options:
source
(str, required): The name of the Airbyte source to load from.stream
(str, required): The name of the stream to load from (Airbyte sources can return multiple streams)config
(dict, required): The configuration for the Airbyte sourcetemplate
(PromptTemplate, optional): A custom prompt template for formatting documentsinclude_metadata
(bool, optional, default True): Whether to include all fields as metadata in the output documents
config
, and you can find the specific configuration options in the “Config field reference” for each source in the Airbyte documentation.