Intro
This notebook (also on github) demonstrates how to use Function Calling functionality with the OpenAI API.
In this demo, we’ll use the Northwind database to convert natural language queries into SQL:
"What is the total revenue for each product in the database?" ->
-> "SELECT ... FROM ..." -> DataFrame
There will be two function calling examples:
- A simple one-step function call to convert a natural language query into SQL, where we’ll put the database schema into the system prompt and them use function calling to convert a natural language query into SQL.
- A two-step function call first gets the schema of the database and then converts a natural language query into SQL.
At the end, we’ll compare the two approaches and do a quick-and-dirty evaluation of the results using a hand-curated list of questions and their expected SQL queries in eval_questions.csv
.
Define functions to call
import json
import sqlite3
import openai
import pandas as pd
# get database schema
def get_schema(db_path="northwind.db"):
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()
schema = {}
for table in tables:
table_name = table[0]
cursor.execute(f"PRAGMA table_info('{table_name}')")
columns = cursor.fetchall()
schema[table_name] = [column[1] for column in columns]
return schema
get_schema()
> {'Categories': ['CategoryID', 'CategoryName', 'Description', 'Picture'],
'sqlite_sequence': ['name', 'seq'],
'CustomerCustomerDemo': ['CustomerID', 'CustomerTypeID'],
'CustomerDemographics': ['CustomerTypeID', 'CustomerDesc'],
'Customers': ['CustomerID',
'CompanyName',
'ContactName',
...
'Suppliers': ['SupplierID',
'CompanyName',
'ContactName',
'ContactTitle',
'Address',
'City',
'Region',
'PostalCode',
'Country',
'Phone',
'Fax',
'HomePage'],
'Territories': ['TerritoryID', 'TerritoryDescription', 'RegionID']}
# read data from SQLite database given a query
def read_data(query, db_path="northwind.db"):
conn = sqlite3.connect(db_path)
df = pd.read_sql_query(query, conn)
conn.close()
return df
Get ground truth result
We’ll compare the LLM-generated results with this ground truth result df_true
later
# Example SQL query that answers the question: "What is the total revenue for each product in the database?"
sample_query = """
SELECT
p.ProductName,
SUM(od.Quantity * od.UnitPrice) AS Revenue
FROM
Products p
JOIN
"Order Details" od
ON
p.ProductID = od.ProductID
GROUP BY
p.ProductName
ORDER BY
Revenue DESC;
"""
df_true = read_data(sample_query)
df_true.head()
ProductName | Revenue | |
---|---|---|
0 | Côte de Blaye | 53274482.70 |
1 | Thüringer Rostbratwurst | 24630836.96 |
2 | Mishi Kobe Niku | 19424638.00 |
3 | Sir Rodney's Marmalade | 16654879.80 |
4 | Carnarvon Tigers | 12607487.50 |
Approach #1: one function call
Here we are doing the following:
- Placing the database schema in the system prompt
SYSTEM_PROMPT_v1
. - Providing the user’s question and the definition of the
read_data
function in thetools
parameter to OpenAI Chat Completion API. - Calling the
read_data
function with the SQL query generated by the LLM - Finaly, we are returning both the SQL query and the result of the query.
SYSTEM_PROMPT_v1 = f"""
The user will ask a question about the database with the following schema:
###
{str(get_schema())}
###
Provide a SQL query answering the question.
Avoid ambiguous column names.
"""
def get_result_v1(user_question, system_prompt=SYSTEM_PROMPT_v1):
messages = [
{"role": "system",
"content": system_prompt},
{"role": "user",
"content": user_question}
]
functions = [
# function to read data from the database
{
"type": "function",
"function": {
"name": "read_data",
"description": "Get pandas dataframe from SQLite database given a query",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "SQL query to execute"
}
},
"required": ["query"]
},
},
}
]
response = openai.chat.completions.create(
model="gpt-3.5-turbo-0125",
temperature=0.0, # for reproducibility
tools=functions,
messages=messages
)
response_message = response.choices[0].message
messages.append(response_message)
tool_call = response_message.tool_calls[0]
function_name = tool_call.function.name
if function_name == "read_data":
# get the query from the tool call
sql_query = json.loads(tool_call.function.arguments)['query']
df = read_data(sql_query)
return sql_query, df
Approach #2: two function calls
Here’s how this approach is different from the one-step function call:
SYSTEM_PROMPT_v2
is a static string that does not contain the database schema.- Both
get_schema
andread_data
functions are passed in thetools
parameter. - We make two API calls: one to get the schema and another to convert the natural language query into SQL.
- We still return both the SQL query and the result of the query.
SYSTEM_PROMPT_v2 = """
The user will ask a question about the database.
First, get the schema of the database.
Then, provide a SQL query answering the question.
Avoid ambiguous column names.
"""
def get_result_v2(user_question, system_prompt=SYSTEM_PROMPT_v2):
messages = [
{"role": "system",
"content": system_prompt},
{"role": "user",
"content": user_question}
]
functions = [
# function to get schema of the database
{
"type": "function",
"function": {
"name": "get_schema",
"description": "Get the schema of the SQLite database",
"parameters": {
"type": "object",
"properties": {},
"required": []
}
},
},
# function to read data from the database
{
"type": "function",
"function": {
"name": "read_data",
"description": "Get pandas dataframe from SQLite database given a query",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "SQL query to execute"
}
},
"required": ["query"]
},
},
}
]
response = openai.chat.completions.create(
model="gpt-3.5-turbo-0125",
temperature=0.0,
tools=functions,
messages=messages
)
response_message = response.choices[0].message
messages.append(response_message)
tool_call = response_message.tool_calls[0]
function_name = tool_call.function.name
if function_name == "get_schema":
schema = get_schema()
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": str(schema),
}
)
second_response = openai.chat.completions.create(
model="gpt-3.5-turbo-0125",
temperature=0.0,
tools=functions,
messages=messages,
)
response_message = second_response.choices[0].message
messages.append(response_message)
tool_call = response_message.tool_calls[0]
function_name = tool_call.function.name
if function_name == "read_data":
# get the query from the tool call
sql_query = json.loads(tool_call.function.arguments)['query']
df = read_data(sql_query)
return sql_query, df
Some test queries
One function call
question = """
What is the total revenue for each product in the database?
Return a table with the product name and total revenue columns.
Sort by total revenue in descending order.
"""
sql_query_v1, df_v1 = get_result_v1(question)
print(sql_query_v1)
df_v1.head()
> SELECT p.ProductName, SUM(od.UnitPrice * od.Quantity * (1 - od.Discount)) AS TotalRevenue FROM Products p JOIN 'Order Details' od ON p.ProductID = od.ProductID GROUP BY p.ProductName ORDER BY TotalRevenue DESC;
ProductName | TotalRevenue | |
---|---|---|
0 | Côte de Blaye | 5.326590e+07 |
1 | Thüringer Rostbratwurst | 2.462347e+07 |
2 | Mishi Kobe Niku | 1.942304e+07 |
3 | Sir Rodney's Marmalade | 1.665381e+07 |
4 | Carnarvon Tigers | 1.260467e+07 |
Note how the generated SQL query, for some reason, contains (1 - od.Discount)
despite the questions not mentioning any discounts.
That’s LLMs being LLMs! 🤷
# Check if the dataframes are the same
# We convert the dataframes to numpy arrays to ignore the differences in column names
(df_true.to_numpy() == df_v1.to_numpy()).all()
> False
Because of the above issue the results don’t match the ground truth.
Two function calls
sql_query_v2, df_v2 = get_result_v2(question)
print(sql_query_v2)
df_v2.head()
> SELECT Products.ProductName, SUM("Order Details".UnitPrice * "Order Details".Quantity) AS TotalRevenue FROM Products JOIN "Order Details" ON Products.ProductID = "Order Details".ProductID GROUP BY Products.ProductName ORDER BY TotalRevenue DESC;
ProductName | TotalRevenue | |
---|---|---|
0 | Côte de Blaye | 53274482.70 |
1 | Thüringer Rostbratwurst | 24630836.96 |
2 | Mishi Kobe Niku | 19424638.00 |
3 | Sir Rodney's Marmalade | 16654879.80 |
4 | Carnarvon Tigers | 12607487.50 |
# Check if the dataframes are the same
# We convert the dataframes to numpy arrays to ignore the differences in column names
(df_true.to_numpy() == df_v2.to_numpy()).all()
> True
Here the results match the ground truth. That, however, doesn’t mean that two function calls are always better than one function call! Let’s run a couple more simpler queries to compare the two approaches. Then we’ll do an evaluation on a set of hand-curated questions in eval_questions.csv
A couple more test queries
question = """
How many orders were shipped to Brazil?
"""
_, df_1 = get_result_v1(question)
_, df_2 = get_result_v2(question)
(df_1.to_numpy() == df_2.to_numpy()).all()
> True
question = """
How many customers ordered more than 20 units of products?
"""
_, df_1 = get_result_v1(question)
_, df_2 = get_result_v2(question)
(df_1.to_numpy() == df_2.to_numpy()).all()
> True
Both approaches are consistent with each other and the ground truth (not shown here) for these simple queries.
Evaluation
Here we’ll run a bit more rigorous evals on a hand-curated list of questions in eval_questions.csv
.
This is a CSV file with 19 questions with their expected SQL queries.
eval_df = pd.read_csv("eval_questions.csv")
eval_df.head()
Question | Answer | |
---|---|---|
0 | Create a report that shows the CategoryName an... | SELECT CategoryName, Description FROM Categori... |
1 | Create a report that show the ContactName, Com... | SELECT ContactName, CompanyName, ContactTitle,... |
2 | Create a report that shows the capitalized Fir... | SELECT \nUPPER(FirstName) AS [First Name], \nU... |
3 | Create a report that shows the top 10 OrderID,... | SELECT OrderID, OrderDate, ShippedDate, Custom... |
4 | Create a report that shows the CustomerID in l... | SELECT LOWER(CustomerID) AS ID FROM Customers |
Evaluation of the one function call approach
get_result = get_result_v1
n_correct = 0
for index, row in eval_df.iterrows():
_, query = row['Question'], row['Answer']
# get the true answer based on the SQL query from the eval dataset
df_true = read_data(query)
try:
_, df_llm = get_result(question)
(df_true.to_numpy() == df_llm.to_numpy()).all()
n_correct += 1
except Exception as e:
pass
print(f"Number of correct responses: {n_correct}/{len(eval_df)}")
Number of correct responses: 19/19
Evaluation of the two function call approach
get_result = get_result_v2
n_correct = 0
for index, row in eval_df.iterrows():
_, query = row['Question'], row['Answer']
# get the true answer based on the SQL query from the eval dataset
df_true = read_data(query)
try:
_, df_llm = get_result(question)
(df_true.to_numpy() == df_llm.to_numpy()).all()
n_correct += 1
except Exception as e:
pass
print(f"Number of correct responses: {n_correct}/{len(eval_df)}")
Number of correct responses: 18/19
Summary
- Function Calling is a useful feature that allows you to call custom functions from the OpenAI API. (though the feature’s name is slightly misleading as it doesn’t call the functions directly)
- It’s possible to provide one or more functions to the API to call.
- The two function call approach is more flexible. In this approach, if the users questions aren’t about the database, then the LLM would likely deterimine not to call the
get_schema
function. Conversly, in the one function call approach, the schema is always present in the system prompt regardless of the user’s question. - Since the dataset is small, it’s hard to say which approach provides more accurate results.
- Interestingly, the two function call approach seems takes ~40% more time than the one function call approach (33s vs 23s).