Activeloop Deep Lake as a Multi-Modal Vector Store that stores embeddings and their metadata including text, jsons, images, audio, video, and more. It saves the data locally, in your cloud, or on Activeloop storage. It performs hybrid search including embeddings and their attributes.This notebook showcases basic functionality related to
Activeloop Deep Lake
. While Deep Lake
can store embeddings, it is capable of storing any type of data. It is a serverless data lake with version control, query engine and streaming dataloaders to deep learning frameworks.
For more information, please see the Deep Lake documentation
Setting up
Example provided by Activeloop
Integration with LangChain.Deep Lake locally
Create a local dataset
Create a dataset locally at./my_deeplake/
, then run similarity search. The Deeplake+LangChain integration uses Deep Lake datasets under the hood, so dataset
and vector store
are used interchangeably. To create a dataset in your own cloud, or in the Deep Lake storage, adjust the path accordingly.
Query dataset
read_only=True
revents accidental modifications to the vector store when updates are not needed. This ensures that the data remains unchanged unless explicitly intended. It is generally a good practice to specify this argument to avoid unintended updates.
Retrieval Question/Answering
Attribute based filtering in metadata
Let’s create another vector store containing metadata with the year the documents were created.Choosing distance function
Distance functionL2
for Euclidean, cos
for cosine similarity
Maximal Marginal relevance
Using maximal marginal relevanceDelete dataset
Deep Lake datasets on cloud (Activeloop, AWS, GCS, etc.) or in memory
By default, Deep Lake datasets are stored locally. To store them in memory, in the Deep Lake Managed DB, or in any object storage, you can provide the corresponding path and credentials when creating the vector store. Some paths require registration with Activeloop and creation of an API token that can be retrieved hereTQL Search
Furthermore, the execution of queries is also supported within the similarity_search method, whereby the query can be specified utilizing Deep Lake’s Tensor Query Language (TQL).Creating vector stores on AWS S3
Deep Lake API
you can access the Deep Lake dataset atdb.vectorstore