Databricks Vector Search is a serverless similarity search engine that allows you to store a vector representation of your data, including metadata, in a vector database. With Vector Search, you can create auto-updating vector search indexes from Delta tables managed by Unity Catalog and query them with a simple API to return the most similar vectors.In the walkthrough, we’ll demo the
SelfQueryRetriever
with a Databricks Vector Search.
create Databricks vector store index
First we’ll want to create a databricks vector store index and seed it with some data. We’ve created a small demo set of documents that contain summaries of movies. Note: The self-query retriever requires you to havelark
installed (pip install lark
) along with integration-specific requirements.
OpenAIEmbeddings
so we have to get the OpenAI API Key.
Creating our self-querying retriever
Now we can instantiate our retriever. To do this we’ll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents.Test it out
And now we can try actually using our retriever!Filter k
We can also use the self query retriever to specifyk
: the number of documents to fetch.
We can do this by passing enable_limit=True
to the constructor.
Filter k
We can also use the self query retriever to specifyk
: the number of documents to fetch.
We can do this by passing enable_limit=True
to the constructor.