SAP HANA Cloud Vector Engine is a vector store fully integrated into the SAP HANA Cloud
database.
Setup
Install thelangchain-hana
external integration package, as well as the other packages used throughout this notebook.
Credentials
Ensure your SAP HANA instance is running. Load your credentials from environment variables and create a connection:Initialization
To initialize aHanaDB
vector store, you need a database connection and an embedding instance. SAP HANA Cloud Vector Engine supports both external and internal embeddings.
-
Using External Embeddings
-
Using Internal Embeddings
VECTOR_EMBEDDING()
function. To enable this, create an instance of HanaInternalEmbeddings
with your internal model ID and pass it to HanaDB
. Note that the HanaInternalEmbeddings
instance is specifically designed for use with HanaDB
and is not intended for use with other vector store implementations. For more information about internal embedding, see the SAP HANA VECTOR_EMBEDDING Function.
Caution: Ensure NLP is enabled in your SAP HANA Cloud instance.
HanaDB
along with a table name for storing vectors:
Example
Load the sample document “state_of_the_union.txt” and create chunks from it.Maximal Marginal Relevance Search (MMR)
Maximal marginal relevance
optimizes for similarity to query AND diversity among selected documents. The first 20 (fetch_k) items will be retrieved from the DB. The MMR algorithm will then find the best 2 (k) matches.
Creating an HNSW Vector Index
A vector index can significantly speed up top-k nearest neighbor queries for vectors. Users can create a Hierarchical Navigable Small World (HNSW) vector index using thecreate_hnsw_index
function.
For more information about creating an index at the database level, please refer to the official documentation.
- Similarity Function: The similarity function for the index is cosine similarity by default. If you want to use a different similarity function (e.g.,
L2
distance), you need to specify it when initializing theHanaDB
instance. - Default Parameters: In the
create_hnsw_index
function, if the user does not provide custom values for parameters likem
,ef_construction
, oref_search
, the default values (e.g.,m=64
,ef_construction=128
,ef_search=200
) will be used automatically. These values ensure the index is created with reasonable performance without requiring user intervention.
Basic Vectorstore Operations
Advanced filtering
In addition to the basic value-based filtering capabilities, it is possible to use more advanced filtering. The table below shows the available filter operators.Operator | Semantic |
---|---|
$eq | Equality (==) |
$ne | Inequality (!=) |
$lt | Less than (<) |
$lte | Less than or equal (<=) |
$gt | Greater than (>) |
$gte | Greater than or equal (>=) |
$in | Contained in a set of given values (in) |
$nin | Not contained in a set of given values (not in) |
$between | Between the range of two boundary values |
$like | Text equality based on the “LIKE” semantics in SQL (using ”%” as wildcard) |
$contains | Filters documents containing a specific keyword |
$and | Logical “and”, supporting 2 or more operands |
$or | Logical “or”, supporting 2 or more operands |
$ne
, $gt
, $gte
, $lt
, $lte
$between
, $in
, $nin
$like
$contains
$and
, $or
Using a VectorStore as a retriever in chains for retrieval augmented generation (RAG)
Standard tables vs. “custom” tables with vector data
As default behaviour, the table for the embeddings is created with 3 columns:- A column
VEC_TEXT
, which contains the text of the Document - A column
VEC_META
, which contains the metadata of the Document - A column
VEC_VECTOR
, which contains the embeddings-vector of the Document’s text
- A column with type
NCLOB
orNVARCHAR
for the text/context of the embeddings - A column with type
NCLOB
orNVARCHAR
for the metadata - A column with type
REAL_VECTOR
for the embedding vector