MyScale is an integrated vector database. You can access your database in SQL and also from here, LangChain.
MyScale
can make use of various data types and functions for filters. It will boost up your LLM app no matter if you are scaling up your data or expand your system to broader application.
In the notebook, we’ll demo the SelfQueryRetriever
wrapped around a MyScale
vector store with some extra pieces we contributed to LangChain.
In short, it can be condensed into 4 points:
- Add
contain
comparator to match the list of any if there is more than one element matched - Add
timestamp
data type for datetime match (ISO-format, or YYYY-MM-DD) - Add
like
comparator for string pattern search - Add arbitrary function capability
Creating a MyScale vector store
MyScale has already been integrated to LangChain for a while. So you can follow this notebook to create your own vectorstore for a self-query retriever. Note: All self-query retrievers requires you to havelark
installed (pip install lark
). We use lark
for grammar definition. Before you proceed to the next step, we also want to remind you that clickhouse-connect
is also needed to interact with your MyScale backend.
OpenAIEmbeddings
. Remember to get an OpenAI API Key for valid access to LLMs.
Create some sample data
As you can see, the data we created has some differences compared to other self-query retrievers. We replaced the keywordyear
with date
which gives you finer control on timestamps. We also changed the type of the keyword gerne
to a list of strings, where an LLM can use a new contain
comparator to construct filters. We also provide the like
comparator and arbitrary function support to filters, which will be introduced in next few cells.
Now let’s look at the data first.
Creating our self-querying retriever
Just like other retrievers… simple and nice.Testing it out with self-query retriever’s existing functionalities
And now we can try actually using our retriever!Wait a second… what else?
Self-query retriever with MyScale can do more! Let’s find out.Filter k
We can also use the self query retriever to specifyk
: the number of documents to fetch.
We can do this by passing enable_limit=True
to the constructor.