Spanner is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.This notebook goes over how to use Spanner to save, load and delete langchain documents with
SpannerLoader
and SpannerDocumentSaver
.
Learn more about the package on GitHub.
Before You Begin
To run this notebook, you will need to do the following:- Create a Google Cloud Project
- Enable the Cloud Spanner API
- Create a Spanner instance
- Create a Spanner database
- Create a Spanner table
🦜🔗 Library Installation
The integration lives in its ownlangchain-google-spanner
package, so we need to install it.
☁ Set Your Google Cloud Project
Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook. If you don’t know your project ID, try the following:- Run
gcloud config list
. - Run
gcloud projects list
. - See the support page: Locate the project ID.
🔐 Authentication
Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.- If you are using Colab to run this notebook, use the cell below and continue.
- If you are using Vertex AI Workbench, check out the setup instructions here.
Basic Usage
Save documents
Save langchain documents withSpannerDocumentSaver.add_documents(<documents>)
. To initialize SpannerDocumentSaver
class you need to provide 3 things:
instance_id
- An instance of Spanner to load data from.database_id
- An instance of Spanner database to load data from.table_name
- The name of the table within the Spanner database to store langchain documents.
Querying for Documents from Spanner
For more details on connecting to a Spanner table, please check the Python SDK documentation.Load documents from table
Load langchain documents withSpannerLoader.load()
or SpannerLoader.lazy_load()
. lazy_load
returns a generator that only queries database during the iteration. To initialize SpannerLoader
class you need to provide:
instance_id
- An instance of Spanner to load data from.database_id
- An instance of Spanner database to load data from.query
- A query of the database dialect.
Delete documents
Delete a list of langchain documents from the table withSpannerDocumentSaver.delete(<documents>)
.
Advanced Usage
Custom client
The client created by default is the default client. To pass incredentials
and project
explicitly, a custom client can be passed to the constructor.
Customize Document Page Content & Metadata
The loader will returns a list of Documents with page content from a specific data columns. All other data columns will be added to metadata. Each row becomes a document.Customize page content format
The SpannerLoader assumes there is a column calledpage_content
. These defaults can be changed like so:
text
(space-separated string concatenation). There are other format that user can specify, including text
, JSON
, YAML
, CSV
.
Customize metadata format
The SpannerLoader assumes there is a metadata column calledlangchain_metadata
that store JSON data. The metadata column will be used as the base dictionary. By default, all other column data will be added and may overwrite the original value. These defaults can be changed like so:
Customize JSON metadata column name
By default, the loader useslangchain_metadata
as the base dictionary. This can be customized to select a JSON column to use as base dictionary for the Document’s metadata.
Custom staleness
The default staleness is 15s. This can be customized by specifying a weaker bound (which can either be to perform all reads as of a given timestamp), or as of a given duration in the past.Turn on data boost
By default, the loader will not use data boost since it has additional costs associated, and require additional IAM permissions. However, user can choose to turn it on.Custom client
The client created by default is the default client. To pass incredentials
and project
explicitly, a custom client can be passed to the constructor.
Custom initialization for SpannerDocumentSaver
The SpannerDocumentSaver allows custom initialization. This allows user to specify how the Document is saved into the table. content_column: This will be used as the column name for the Document’s page content. Defaulted topage_content
.
metadata_columns: These metadata will be saved into specific columns if the key exists in the Document’s metadata.
metadata_json_column: This will be the column name for the spcial JSON column. Defaulted to langchain_metadata
.
Initialize custom schema for Spanner
The SpannerDocumentSaver will have ainit_document_table
method to create a new table to store docs with custom schema.