PGVectorStore
is an implementation of a LangChain vectorstore using postgres
as the backend.
This notebook goes over how to use the PGVectorStore
API.
The code lives in an integration package called: langchain-postgres.
Setup
This package requires a PostgreSQL database with thepgvector
extension.
You can run the following command to spin up a container for a pgvector
enabled Postgres instance:
Install
Install the integration library,langchain-postgres
.
Set your Postgres values
Set your Postgres values to test the functionality in this notebook against a Postgres instance.Initialization
PGEngine Connection Pool
One of the requirements and arguments to establish PostgreSQL as a vector store is aPGEngine
object. The PGEngine
configures a shared connection pool to your Postgres database. This is an industry best practice to manage number of connections and to reduce latency through cached database connections.
PGVectorStore
can be used with the asyncpg
and psycopg3
drivers.
To create a PGEngine
using PGEngine.from_connection_string()
you need to provide:
url
: Connection string using thepostgresql+asyncpg
driver.
PGEngine
using PGEngine.from_engine()
you need to provide:
engine
: An object ofAsyncEngine
Initialize a table
ThePGVectorStore
class requires a database table. The PGEngine
engine has a helper method ainit_vectorstore_table()
that can be used to create a table with the proper schema for you.
See Create a custom Vector Store or Create a Vector Store using existing table for customizing the schema.
Optional Tip: 💡
You can also specify a schema name by passingschema_name
wherever you pass table_name
. Eg:
Create an embedding class instance
You can use any LangChain embeddings model.Initialize a default PGVectorStore
Use the default table schema to connect to the vectorstore. See Create a custom Vector Store or Create a Vector Store using existing table for customizing the schema.Manage vector store
Add documents
Add documents to the vector store. Metadata is stored in a JSON column, see “Create a custom Vector Store” to store metadata to be used for filters.Add texts
Add text directly to the vectorstore, if not structured as a Document.Delete documents
Documents can be deleted using ids.Query vector store
Search for documents
Use a natural language query to search for similar documents.Search for documents by vector
Search for similar documents using a vector embedding.Add a Index
Speed up vector search queries by applying a vector index. Learn more about vector indexes. Indexes will use a default index name if a name is not provided. To add multiple indexes, different index names are required.Re-index
Rebuild an index using the data stored in the index’s table, replacing the old copy of the index. Some index types may require re-indexing after a considerable amount of new data is added.Drop an index
Remove a vector index.Create a custom Vector Store
Customize the vectorstore with special column names or with custom metadata columns.ainit_vectorstore_table
- Use fields
content_column
,embedding_column
,metadata_columns
,metadata_json_column
,id_column
to rename the columns. - Use the
Column
class to create custom id or metadata columns. A Column is defined by a name and data type. Any Postgres data type can be used. - Use
store_metadata
to create a JSON column to store extra metadata.
Optional Tip: 💡
To use non-uuid ids, you must customize the id column:PGVectorStore
- Use fields
content_column
,embedding_column
,metadata_columns
,metadata_json_column
,id_column
to rename the columns. ignore_metadata_columns
to ignore columns that should not be used for Document metadata. This is helpful when using a preexisting table, where all data columns are not necessary.- Use a different
distance_strategy
for the similarity calculation during vector search. - Use
index_query_options
to tune local index parameters during vector search.
Search for documents with metadata filter
A Vector Store can take advantage of relational data to filter similarity searches. The vectorstore supports a set of filters that can be applied against the metadata fields of the documents. See the migration guide for details on how to migrate to use metadata columns.PGVectorStore
currently supports the following operators and all Postgres data types.
Operator | Meaning/Category |
---|---|
$eq | Equality (==) |
$ne | Inequality (!=) |
$lt | Less than (<) |
$lte | Less than or equal (<=) |
$gt | Greater than (>) |
$gte | Greater than or equal (>=) |
$in | Special Cased (in) |
$nin | Special Cased (not in) |
$between | Special Cased (between) |
$exists | Special Cased (is null) |
$like | Text (like) |
$ilike | Text (case-insensitive like) |
$and | Logical (and) |
$or | Logical (or) |
Create a Vector Store using existing table
A Vector Store can be built up on an existing table. Assuming there’s a pre-existing table in PG DB:products
, which stores product details for an eComm venture.
Here is how this table mapped to PGVectorStore
:
-
id_column="product_id"
: ID column uniquely identifies each row in the products table. -
content_column="description"
: Thedescription
column contains text descriptions of each product. This text is used by theembedding_service
to create vectors that go in embedding_column and represent the semantic meaning of each description. -
embedding_column="embed"
: Theembed
column stores the vectors created from the product descriptions. These vectors are used to find products with similar descriptions. -
metadata_columns=["name", "category", "price_usd", "quantity", "sku", "image_url"]
: These columns are treated as metadata for each product. Metadata provides additional information about a product, such as its name, category, price, quantity available, SKU (Stock Keeping Unit), and an image URL. This information is useful for displaying product details in search results or for filtering and categorization. -
metadata_json_column="metadata"
: Themetadata
column can store any additional information about the products in a flexible JSON format. This allows for storing varied and complex data that doesn’t fit into the standard columns.
-
Optional: If the
embed
column is newly created or has different dimensions than supported by embedding model, it is required to one-time add the embeddings for the old records, like this:ALTER TABLE products ADD COLUMN embed vector(768) DEFAULT NULL
-
For new records, added via
VectorStore
embeddings are automatically generated.