Skip to main content
The v1/chat/completions endpoint supports function calling through the tools and tool_choice parameters. For reasoning models, you can also set reasoning_effort to control how much internal reasoning the model performs before responding.
The Ninja chat template of the tokenizer must accept tools for function calling to work.

Defining tools

Provide tool definitions so the model knows which functions it can call:
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = model.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "What is the weather like in New York?"}],
    tools=tools,
    tool_choice="auto"
)

Tool choice

Use tool_choice to control whether and which tool the model should call:
  • auto (default): The model decides whether to call a tool.
  • none: The model will not call any tools.
  • required: The model must call one or more tools.
  • Specific function: Pass {"type": "function", "function": {"name": "get_current_weather"}} to force a particular function.
response = model.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "What is the weather like in New York?"}],
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_current_weather"}}
)

Reasoning effort

For supported reasoning models, set reasoning_effort to balance quality and latency:
response = model.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Plan a 3-day trip to Tokyo."}],
    tools=tools,
    tool_choice="auto",
    reasoning_effort="medium"
)
Supported values depend on the model and may include none, minimal, low, medium, high, and xhigh.

Handling tool calls

We attempt to parse tool calls from model responses and return them in the standard tool_calls field on the assistant message. When parsing succeeds, you can use the response directly:
message = response.choices[0].message

if message.tool_calls:
    for tool_call in message.tool_calls:
        print(f"Function: {tool_call.function.name}")
        print(f"Arguments: {tool_call.function.arguments}")
Tool call parsing may fail for some models or chat templates. If tool_calls is empty but the model clearly intended to call a function, fall back to manual parsing from the raw message content (see below).

Manual parsing fallback

If automatic parsing does not work for your model, parse tool calls from the generated text yourself:
import re

def parse_tool_calls(response):
    tool_calls = []
    
    # Find all tool call patterns in the response
    tool_call_pattern = r"\{\"name\":\"(.*?)\",\"arguments\":(.*?)\}"
    matches = re.findall(tool_call_pattern, response)
    
    for match in matches:
        name, arguments = match
        tool_calls.append({
            "name": name,
            "arguments": arguments
        })
    
    return tool_calls

response = model.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "What is the weather like in New York?"}],
    tools=tools,
    tool_choice="auto"
)

message = response.choices[0].message

# Prefer parsed tool_calls when available
if message.tool_calls:
    tool_calls = message.tool_calls
else:
    tool_calls = parse_tool_calls(message.content or "")

Streaming with tools

You can stream responses when using tools. Accumulate the full response and check for tool calls as chunks arrive:
import re

def parse_tool_calls(response):
    tool_calls = []
    
    tool_call_pattern = r"\{"name":"(.*?)","arguments":(.*?)\}"
    matches = re.findall(tool_call_pattern, response)
    
    for match in matches:
        name, arguments = match
        tool_calls.append({
            "name": name,
            "arguments": arguments
        })
    
    return tool_calls

response = model.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "What is the weather in Paris?"}],
    tools=tools,
    stream=True
)

full_response = ""
for chunk in response:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="")
        full_response += delta.content
    if delta.tool_calls:
        print("Model wants to call:", delta.tool_calls)

# Fallback: parse from accumulated text if streaming deltas had no tool_calls
tool_calls = parse_tool_calls(full_response)
if tool_calls:
    print("Parsed tool calls:", tool_calls)
For parameter details, see the API reference for tools, tool_choice, and reasoning_effort.