Ontotext GraphDB is a graph database and knowledge discovery tool compliant with RDF and SPARQL.
This notebook shows how to use LLMs to provide natural language querying (NLQ to SPARQL, also calledtext2sparql
) forOntotext GraphDB
.
GraphDB LLM Functionalities
GraphDB
supports some LLM integration functionalities as described here:
gpt-queries
- magic predicates to ask an LLM for text, list or table using data from your knowledge graph (KG)
- query explanation
- result explanation, summarization, rephrasing, translation
- Indexing of KG entities in a vector database
- Supports any text embedding algorithm and vector database
- Uses the same powerful connector (indexing) language that GraphDB uses for Elastic, Solr, Lucene
- Automatic synchronization of changes in RDF data to the KG entity index
- Supports nested objects (no UI support in GraphDB version 10.5)
- Serializes KG entities to text like this (e.g. for a Wines dataset):
- A simple chatbot using a defined KG entity index
SPARQL
generation from NLQ. We’ll use the Star Wars API
(SWAPI
) ontology and dataset that you can examine here.
Setting up
You need a running GraphDB instance. This tutorial shows how to run the database locally using the GraphDB Docker image. It provides a docker compose set-up, which populates GraphDB with the Star Wars dataset. All necessary files including this notebook can be downloaded from the GitHub repository langchain-graphdb-qa-chain-demo.- Install Docker. This tutorial is created using Docker version
24.0.7
which bundles Docker Compose. For earlier Docker versions you may need to install Docker Compose separately. - Clone the GitHub repository langchain-graphdb-qa-chain-demo in a local folder on your machine.
- Start GraphDB with the following script executed from the same folder
http://localhost:7200/
. The Star Wars dataset starwars-data.trig
is automatically loaded into the langchain
repository. The local SPARQL endpoint http://localhost:7200/repositories/langchain
can be used to run queries against. You can also open the GraphDB Workbench from your favourite web browser http://localhost:7200/sparql
where you can make queries interactively.
- Set up working environment
conda
, create and activate a new conda environment, e.g.:
Specifying the ontology
In order for the LLM to be able to generate SPARQL, it needs to know the knowledge graph schema (the ontology). It can be provided using one of two parameters on theOntotextGraphDBGraph
class:
query_ontology
: aCONSTRUCT
query that is executed on the SPARQL endpoint and returns the KG schema statements. We recommend that you store the ontology in its own named graph, which will make it easier to get only the relevant statements (as the example below).DESCRIBE
queries are not supported, becauseDESCRIBE
returns the Symmetric Concise Bounded Description (SCBD), i.e. also the incoming class links. In case of large graphs with a million of instances, this is not efficient. Check github.com/eclipse-rdf4j/rdf4j/issues/4857local_file
: a local RDF ontology file. Supported RDF formats areTurtle
,RDF/XML
,JSON-LD
,N-Triples
,Notation-3
,Trig
,Trix
,N-Quads
.
- Include enough information about classes, properties, property attachment to classes (using rdfs:domain, schema:domainIncludes or OWL restrictions), and taxonomies (important individuals).
- Not include overly verbose and irrelevant definitions and examples that do not help SPARQL construction.
Turtle
since Turtle
with appropriate prefixes is most compact and easiest for the LLM to remember.
The Star Wars ontology is a bit unusual in that it includes a lot of specific triples about classes, e.g. that the species :Aleena
live on <planet/38>
, they are a subclass of :Reptile
, have certain typical characteristics (average height, average lifespan, skinColor), and specific individuals (characters) are representatives of that class:
OntotextGraphDBGraph
.
Question Answering against the StarWars dataset
We can now use theOntotextGraphDBQAChain
to ask some questions.
Chain modifiers
The Ontotext GraphDB QA chain allows prompt refinement for further improvement of your QA chain and enhancing the overall user experience of your app.”SPARQL Generation” prompt
The prompt is used for the SPARQL query generation based on the user question and the KG schema.-
sparql_generation_prompt
Default value:
“SPARQL Fix” prompt
Sometimes, the LLM may generate a SPARQL query with syntactic errors or missing prefixes, etc. The chain will try to amend this by prompting the LLM to correct it a certain number of times.-
sparql_fix_prompt
Default value: -
max_fix_retries
Default value:5
”Answering” prompt
The prompt is used for answering the question based on the results returned from the database and the initial user question. By default, the LLM is instructed to only use the information from the returned result(s). If the result set is empty, the LLM should inform that it can’t answer the question.-
qa_prompt
Default value:
docker compose down -v --remove-orphans
from the directory with the Docker compose file.