Kùzu is an embeddable, scalable, extremely fast graph database. It is permissively licensed with an MIT license, and you can see its source code here.
Key characteristics of Kùzu:
- Performance and scalability: Implements modern, state-of-the-art join algorithms for graphs.
- Usability: Very easy to set up and get started with, as there are no servers (embedded architecture).
- Interoperability: Can conveniently scan and copy data from external columnar formats, CSV, JSON and relational databases.
- Structured property graph model: Implements the property graph model, with added structure.
- Cypher support: Allows convenient querying of the graph in Cypher, a declarative query language.
Get started with Kùzu by visiting their documentation.
Setting up
Kùzu is an embedded database (it runs in-process), so there are no servers to manage. Install the following dependencies to get started:Create KuzuGraph
Kùzu’s integration with LangChain makes it convenient to create and update graphs from unstructured text, and also to query graphs via a Text2Cypher pipeline that utilizes the
power of LangChain’s LLM chains. To begin, we create a KuzuGraph
object that uses the database object we created above in combination with the KuzuGraph
constructor.
LLMGraphTransformer
to use an LLM to extract nodes and relationships from the text.
To make the graph more useful, we will define the following schema, such that the LLM will only
extract nodes and relationships that match the schema.
LLMGraphTransformer
class provides a convenient way to convert the text into a list of graph documents.
KuzuGraph
object’s add_graph_documents
method to ingest the graph documents into the Kùzu database.
The include_source
argument is set to True
so that we also create relationships between each entity node and the source document that it came from.
Creating KuzuQAChain
To query the graph via a Text2Cypher pipeline, we can define a KuzuQAChain
object. Then, we can invoke the chain with a query by connecting to the existing database that’s stored in the test_db
directory defined above.
Refresh graph schema
If you mutate or update the graph, you can inspect the refreshed schema information that’s used by the Text2Cypher chain to generate Cypher statements. You don’t need to manually callrefresh_schema()
each time as it’s called automatically when you invoke the chain.
Use separate LLMs for Cypher and answer generation
You can specifycypher_llm
and qa_llm
separately to use different LLMs for Cypher generation and answer generation.