Apache Doris is a modern data warehouse for real-time analytics. It delivers lightning-fast analytics on real-time data at scale.
Usually Apache Doris
is categorized into OLAP, and it has showed excellent performance in ClickBench — a Benchmark For Analytical DBMS. Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.
You’ll need to install langchain-community
with pip install -qU langchain-community
to use this integration
Here we’ll show how to use the Apache Doris Vector Store.
Setup
update_vectordb = False
at the beginning. If there is no docs updated, then we don’t need to rebuild the embeddings of docs
Load docs and split them into tokens
Load all markdown files under thedocs
directory
for Apache Doris documents, you can clone repo from github.com/apache/doris, and there is docs
directory in it.
update_vectordb = True
because there are new docs/tokens.
Create vectordb instance
Use Apache Doris as vectordb
Convert tokens into embeddings and put them into vectordb
Here we use Apache Doris as vectordb, you can configure Apache Doris instance viaApacheDorisSettings
.
Configuring Apache Doris instance is pretty much like configuring mysql instance. You need to specify:
- host/port
- username(default: ‘root’)
- password(default: ”)
- database(default: ‘default’)
- table(default: ‘langchain’)