As of June 2024, Rockset has been acquired by OpenAI and shut down its public services. Rockset was a real-time analytics database known for world-class indexing and retrieval. Now, its core team and technology are being integrated into OpenAI’s infrastructure to power future AI products. This LangChain integration is no longer functional and is preserved for archival purposes only.
Rockset is a real-time analytics database which enables queries on massive, semi-structured data without operational burden. With Rockset, ingested data is queryable within one second and analytical queries against that data typically execute in milliseconds. Rockset is compute optimized, making it suitable for serving high concurrency applications in the sub-100TB range (or larger than 100s of TBs with rollups).This notebook demonstrates how to use Rockset as a document loader in langchain. To get started, make sure you have a Rockset account and an API key available.
Setting up the environment
- Go to the Rockset console and get an API key. Find your API region from the API reference. For the purpose of this notebook, we will assume you’re using Rockset from
Oregon(us-west-2)
. - Set your the environment variable
ROCKSET_API_KEY
. - Install the Rockset python client, which will be used by langchain to interact with the Rockset database.
Loading Documents
The Rockset integration with LangChain allows you to load documents from Rockset collections with SQL queries. In order to do this you must construct aRocksetLoader
object. Here is an example snippet that initializes a RocksetLoader
.
text
column in the collection is used as the page content, and the record’s id
and date
columns are used as metadata (if you do not pass anything into metadata_keys
, the whole Rockset document will be used as metadata).
To execute the query and access an iterator over the resulting Document
s, run:
Document
s at once, run:
loader.load()
:
Using multiple columns as content
You can choose to use multiple columns as content:"This is the first sentence."
and the “sentence2” field is "This is the second sentence."
, the page_content
of the resulting Document
would be:
content_columns_joiner
argument in the RocksetLoader
constructor. content_columns_joiner
is a method that takes in a List[Tuple[str, Any]]]
as an argument, representing a list of tuples of (column name, column value). By default, this is a method that joins each column value with a new line.
For example, if you wanted to join sentence1 and sentence2 with a space instead of a new line, you could set content_columns_joiner
like so:
page_content
of the resulting Document
would be:
page_content
. You can do that like this:
page_content
: