Overview
With the Kinetica LLM workflow you create an LLM context in the database that provides information needed for infefencing that includes tables, annotations, rules, and samples. InvokingChatKinetica.load_messages_from_context()
will retrieve the
context information from the database so that it can be used to create a chat prompt.
The chat prompt consists of a SystemMessage
and pairs of
HumanMessage
/AIMessage
that contain the samples which are question/SQL
pairs. You can append pairs samples to this list but it is not intended to
facilitate a typical natural language conversation.
When you create a chain from the chat prompt and execute it, the Kinetica LLM will
generate SQL from the input. Optionally you can use KineticaSqlOutputParser
to
execute the SQL and return the result as a dataframe.
Currently, 2 LLM’s are supported for SQL generation:
- Kinetica SQL-GPT: This LLM is based on OpenAI ChatGPT API.
- Kinetica SqlAssist: This LLM is purpose built to integrate with the Kinetica database and it can run in a secure customer premise.
Prerequisites
To get started you will need a Kinetica DB instance. If you don’t have one you can obtain a free development instance. You will need to install the following packages…Database Connection
You must set the database connection in the following environment variables. If you are using a virtual environment you can set them in the.env
file of the project:
KINETICA_URL
: Database connection URLKINETICA_USER
: Database userKINETICA_PASSWD
: Secure password.
KineticaChatLLM
then you are successfully connected.
Create test data
Before we can generate SQL we will need to create a Kinetica table and an LLM context that can inference the table.Create some fake user profiles
We will use thefaker
package to create a dataframe with 100 fake profiles.
Create a Kinetica table from the Dataframe
Create the LLM context
You can create an LLM Context using the Kinetica Workbench UI or you can manually create it with theCREATE OR REPLACE CONTEXT
syntax.
Here we create a context from the SQL syntax referencing the table we created.
Use LangChain for inferencing
In the example below we will create a chain from the previously created table and LLM context. This chain will generate SQL and return the resulting data as a dataframe.Load the chat prompt from the Kinetica DB
Theload_messages_from_context()
function will retrieve a context from the DB and convert it into a list of chat messages that we use to create a ChatPromptTemplate
.
Create the chain
The last element of this chain isKineticaSqlOutputParser
that will execute the SQL and return a dataframe. This is optional and if we left it out then only SQL would be returned.
Generate the SQL
The chain we created will take a question as input and return aKineticaSqlResponse
containing the generated SQL and data. The question must be relevant to the to LLM context we used to create the prompt.