Experiments with OpenAI's Function Calling

Intro

This notebook (also on github) demonstrates how to use Function Calling functionality with the OpenAI API.

In this demo, we’ll use the Northwind database to convert natural language queries into SQL:

"What is the total revenue for each product in the database?" ->
-> "SELECT ... FROM ..." ->  DataFrame

There will be two function calling examples:

A simple one-step function call to convert a natural language query into SQL, where we’ll put the database schema into the system prompt and them use function calling to convert a natural language query into SQL.
A two-step function call first gets the schema of the database and then converts a natural language query into SQL.

At the end, we’ll compare the two approaches and do a quick-and-dirty evaluation of the results using a hand-curated list of questions and their expected SQL queries in eval_questions.csv.

Define functions to call

import json
import sqlite3

import openai
import pandas as pd

# get database schema
def get_schema(db_path="northwind.db"):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
    tables = cursor.fetchall()
    schema = {}
    for table in tables:
        table_name = table[0]
        cursor.execute(f"PRAGMA table_info('{table_name}')")
        columns = cursor.fetchall()
        schema[table_name] = [column[1] for column in columns]
    return schema

get_schema()
>     {'Categories': ['CategoryID', 'CategoryName', 'Description', 'Picture'],
     'sqlite_sequence': ['name', 'seq'],
     'CustomerCustomerDemo': ['CustomerID', 'CustomerTypeID'],
     'CustomerDemographics': ['CustomerTypeID', 'CustomerDesc'],
     'Customers': ['CustomerID',
      'CompanyName',
      'ContactName',
      ...
     'Suppliers': ['SupplierID',
      'CompanyName',
      'ContactName',
      'ContactTitle',
      'Address',
      'City',
      'Region',
      'PostalCode',
      'Country',
      'Phone',
      'Fax',
      'HomePage'],
     'Territories': ['TerritoryID', 'TerritoryDescription', 'RegionID']}

# read data from SQLite database given a query
def read_data(query, db_path="northwind.db"):
    conn = sqlite3.connect(db_path)
    df = pd.read_sql_query(query, conn)
    conn.close()
    return df

Get ground truth result

We’ll compare the LLM-generated results with this ground truth result df_true later

# Example SQL query that answers the question: "What is the total revenue for each product in the database?"
sample_query = """ 
SELECT
    p.ProductName,
    SUM(od.Quantity * od.UnitPrice) AS Revenue
FROM  
    Products p
JOIN    
    "Order Details" od
ON
    p.ProductID = od.ProductID
GROUP BY
    p.ProductName
ORDER BY
    Revenue DESC;
"""
df_true = read_data(sample_query)
df_true.head()

	ProductName	Revenue
0	Côte de Blaye	53274482.70
1	Thüringer Rostbratwurst	24630836.96
2	Mishi Kobe Niku	19424638.00
3	Sir Rodney's Marmalade	16654879.80
4	Carnarvon Tigers	12607487.50

Approach #1: one function call

Here we are doing the following:

Placing the database schema in the system prompt SYSTEM_PROMPT_v1.
Providing the user’s question and the definition of the read_data function in the tools parameter to OpenAI Chat Completion API.
Calling the read_data function with the SQL query generated by the LLM
Finaly, we are returning both the SQL query and the result of the query.

SYSTEM_PROMPT_v1 = f"""
            The user will ask a question about the database with the following schema:
            ###
            {str(get_schema())}
            ###
            Provide a SQL query answering the question.  
            Avoid ambiguous column names.
            """

def get_result_v1(user_question, system_prompt=SYSTEM_PROMPT_v1):
    messages = [
        {"role": "system",
            "content": system_prompt},
        {"role": "user",
            "content": user_question}
    ]
    functions = [
        # function to read data from the database
        {
            "type": "function",
            "function": {
                "name": "read_data",
                "description": "Get pandas dataframe from SQLite database given a query",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "SQL query to execute"
                        }
                    },
                    "required": ["query"]
                },
            },
        }
    ]

    response = openai.chat.completions.create(
        model="gpt-3.5-turbo-0125",
        temperature=0.0,  # for reproducibility
        tools=functions,
        messages=messages
    )
    response_message = response.choices[0].message
    messages.append(response_message)
    tool_call = response_message.tool_calls[0]
    function_name = tool_call.function.name
    if function_name == "read_data":
        # get the query from the tool call
        sql_query = json.loads(tool_call.function.arguments)['query']
        df = read_data(sql_query)
        return sql_query, df

Approach #2: two function calls

Here’s how this approach is different from the one-step function call:

SYSTEM_PROMPT_v2 is a static string that does not contain the database schema.
Both get_schema and read_data functions are passed in the tools parameter.
We make two API calls: one to get the schema and another to convert the natural language query into SQL.
We still return both the SQL query and the result of the query.

SYSTEM_PROMPT_v2 = """
        The user will ask a question about the database. 
        First, get the schema of the database. 
        Then, provide a SQL query answering the question.       
        Avoid ambiguous column names.
        """


def get_result_v2(user_question, system_prompt=SYSTEM_PROMPT_v2):
    messages = [
        {"role": "system",
         "content": system_prompt},
        {"role": "user",
         "content": user_question}
    ]
    functions = [
        # function to get schema of the database
        {
            "type": "function",
            "function": {
                "name": "get_schema",
                "description": "Get the schema of the SQLite database",
                "parameters": {
                    "type": "object",
                    "properties": {},
                    "required": []
                }
            },
        },
        # function to read data from the database
        {
            "type": "function",
            "function": {
                "name": "read_data",
                "description": "Get pandas dataframe from SQLite database given a query",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "SQL query to execute"
                        }
                    },
                    "required": ["query"]
                },
            },
        }
    ]

    response = openai.chat.completions.create(
        model="gpt-3.5-turbo-0125",
        temperature=0.0,
        tools=functions,
        messages=messages
    )
    response_message = response.choices[0].message
    messages.append(response_message)
    tool_call = response_message.tool_calls[0]
    function_name = tool_call.function.name
    if function_name == "get_schema":
        schema = get_schema()
        messages.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": str(schema),
            }
        )
        second_response = openai.chat.completions.create(
            model="gpt-3.5-turbo-0125",
            temperature=0.0,
            tools=functions,
            messages=messages,
        )

        response_message = second_response.choices[0].message
        messages.append(response_message)
        tool_call = response_message.tool_calls[0]
        function_name = tool_call.function.name
        if function_name == "read_data":
            # get the query from the tool call
            sql_query = json.loads(tool_call.function.arguments)['query']
            df = read_data(sql_query)
            return sql_query, df

Some test queries

One function call

question = """
What is the total revenue for each product in the database? 
Return a table with the product name and total revenue columns. 
Sort by total revenue in descending order.
"""
sql_query_v1, df_v1 = get_result_v1(question)
print(sql_query_v1)
df_v1.head()
>  SELECT p.ProductName, SUM(od.UnitPrice * od.Quantity * (1 - od.Discount)) AS TotalRevenue FROM Products p JOIN 'Order Details' od ON p.ProductID = od.ProductID GROUP BY p.ProductName ORDER BY TotalRevenue DESC;

	ProductName	TotalRevenue
0	Côte de Blaye	5.326590e+07
1	Thüringer Rostbratwurst	2.462347e+07
2	Mishi Kobe Niku	1.942304e+07
3	Sir Rodney's Marmalade	1.665381e+07
4	Carnarvon Tigers	1.260467e+07

Note how the generated SQL query, for some reason, contains (1 - od.Discount) despite the questions not mentioning any discounts.

That’s LLMs being LLMs! 🤷

# Check if the dataframes are the same

# We convert the dataframes to numpy arrays to ignore the differences in column names

(df_true.to_numpy() == df_v1.to_numpy()).all()
> False

Because of the above issue the results don’t match the ground truth.

Two function calls

sql_query_v2, df_v2 = get_result_v2(question)
print(sql_query_v2)
df_v2.head()
> SELECT Products.ProductName, SUM("Order Details".UnitPrice * "Order Details".Quantity) AS TotalRevenue FROM Products JOIN "Order Details" ON Products.ProductID = "Order Details".ProductID GROUP BY Products.ProductName ORDER BY TotalRevenue DESC;

	ProductName	TotalRevenue
0	Côte de Blaye	53274482.70
1	Thüringer Rostbratwurst	24630836.96
2	Mishi Kobe Niku	19424638.00
3	Sir Rodney's Marmalade	16654879.80
4	Carnarvon Tigers	12607487.50

# Check if the dataframes are the same

# We convert the dataframes to numpy arrays to ignore the differences in column names

(df_true.to_numpy() == df_v2.to_numpy()).all()
> True

Here the results match the ground truth. That, however, doesn’t mean that two function calls are always better than one function call! Let’s run a couple more simpler queries to compare the two approaches. Then we’ll do an evaluation on a set of hand-curated questions in eval_questions.csv

A couple more test queries

question = """
How many orders were shipped to Brazil?
"""
_, df_1 = get_result_v1(question)
_, df_2 = get_result_v2(question)

(df_1.to_numpy() == df_2.to_numpy()).all()
> True

question = """
How many customers ordered more than 20 units of products?
"""
_, df_1 = get_result_v1(question)
_, df_2 = get_result_v2(question)

(df_1.to_numpy() == df_2.to_numpy()).all()
> True

Both approaches are consistent with each other and the ground truth (not shown here) for these simple queries.

Evaluation

Here we’ll run a bit more rigorous evals on a hand-curated list of questions in eval_questions.csv. This is a CSV file with 19 questions with their expected SQL queries.

eval_df = pd.read_csv("eval_questions.csv")
eval_df.head()

	Question	Answer
0	Create a report that shows the CategoryName an...	SELECT CategoryName, Description FROM Categori...
1	Create a report that show the ContactName, Com...	SELECT ContactName, CompanyName, ContactTitle,...
2	Create a report that shows the capitalized Fir...	SELECT \nUPPER(FirstName) AS [First Name], \nU...
3	Create a report that shows the top 10 OrderID,...	SELECT OrderID, OrderDate, ShippedDate, Custom...
4	Create a report that shows the CustomerID in l...	SELECT LOWER(CustomerID) AS ID FROM Customers

Evaluation of the one function call approach

get_result = get_result_v1

n_correct = 0
for index, row in eval_df.iterrows():
    _, query = row['Question'], row['Answer']
    # get the true answer based on the SQL query from the eval dataset
    df_true = read_data(query)
    try:
        _, df_llm = get_result(question)
        (df_true.to_numpy() == df_llm.to_numpy()).all()
        n_correct += 1
    except Exception as e:
        pass

print(f"Number of correct responses: {n_correct}/{len(eval_df)}")

Number of correct responses: 19/19

Evaluation of the two function call approach

get_result = get_result_v2

n_correct = 0
for index, row in eval_df.iterrows():
    _, query = row['Question'], row['Answer']
    # get the true answer based on the SQL query from the eval dataset
    df_true = read_data(query)
    try:
        _, df_llm = get_result(question)
        (df_true.to_numpy() == df_llm.to_numpy()).all()
        n_correct += 1
    except Exception as e:
        pass

print(f"Number of correct responses: {n_correct}/{len(eval_df)}")

Number of correct responses: 18/19

Summary

Function Calling is a useful feature that allows you to call custom functions from the OpenAI API. (though the feature’s name is slightly misleading as it doesn’t call the functions directly)
It’s possible to provide one or more functions to the API to call.
The two function call approach is more flexible. In this approach, if the users questions aren’t about the database, then the LLM would likely deterimine not to call the get_schema function. Conversly, in the one function call approach, the schema is always present in the system prompt regardless of the user’s question.
Since the dataset is small, it’s hard to say which approach provides more accurate results.
Interestingly, the two function call approach seems takes ~40% more time than the one function call approach (33s vs 23s).

Intro#

Define functions to call#

Get ground truth result#

Approach #1: one function call#

Approach #2: two function calls#

Some test queries#

Evaluation#

Summary#