Bodo DataFrames is a high performance DataFrame library for large scale Python data processing and drop-in replacement for Pandas; simply replace:
import pandas as pd
with:
import bodo.pandas as pd
to automatically scale and accelerate Pandas workloads. Since Bodo DataFrames is compatible with Pandas, it is an ideal target for LLM code generation that’s easy to verify, efficient, and scalable beyond the typical limitations of Pandas. Our integration package provides a toolkit for asking agents questions about large datasets using Bodo DataFrames for efficiency and scalability. Under the hood, Bodo DataFrames uses lazy evaluation to optimize sequences of Pandas operations, streams data through operators to enable processing larger-than-memory datasets, and leverages MPI-based high-performance computing technology for efficient parallel execution that can easily scale from laptop to large cluster.

Installation and setup

pip
pip install -U langchain_bodo

Toolkit

The langchain-bodo package provides functionality for creating agents that can answer questions about large datasets using Bodo DataFrames. See the Bodo DataFrames tools page for more detailed usage examples. NOTE: This feature uses the Python agent under the hood, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously.
from langchain_bodo import create_bodo_dataframes_agent

Usage Example

Before running the code below, copy the titanic dataset and save locally as titanic.csv.
import bodo.pandas as pd
from langchain_openai import OpenAI

df = pd.read_csv("titanic.csv")
agent = create_bodo_dataframes_agent(
    OpenAI(temperature=0), df, verbose=True, allow_dangerous_code=True
)
agent.invoke("how many rows are there?")
> Entering new AgentExecutor chain...
Thought: I can use the len() function to get the number of rows in the dataframe.
Action: python_repl_ast
Action Input: len(df)891891 is the number of rows in the dataframe.
Final Answer: 891

> Finished chain.
{'input': 'how many rows are there?', 'output': '891'}