Function Calling LLM Best Practices: Production Guide for 2026

Function calling transforms LLMs from text generators into action-taking agents. But production function calling LLM best practices require more than just defining tools—you need robust schemas, error handling, and security guardrails.

What Is Function Calling in LLMs?

Function calling (also called tool use or function invocation) allows LLMs to:

Decide when to call external functions
Generate proper function arguments from natural language
Interpret function results and continue conversation

Instead of just generating text, the LLM orchestrates actions:

User: "What's the weather in Lagos?"
LLM: [Calls get_weather(city="Lagos")]
Function: {"temp": 28, "condition": "Partly cloudy"}
LLM: "It's 28°C and partly cloudy in Lagos right now."

Why Function Calling Matters

Function calling enables AI agents to:

Access real-time data — APIs, databases, web search
Take actions — Send emails, update records, trigger workflows
Use specialized tools — Calculators, code interpreters, domain-specific APIs
Extend capabilities — Break free from training data limitations

Function Calling Best Practices

1. Write Clear Function Descriptions

Bad:

def search(query):
    """Searches stuff"""
    pass

Good:

def search_knowledge_base(
    query: str,
    max_results: int = 5,
    filters: Optional[Dict[str, str]] = None
) -> List[Dict[str, Any]]:
    """
    Search the company knowledge base for relevant documents.
    
    Use this when the user asks questions about company policies, 
    procedures, or internal documentation.
    
    Args:
        query: Natural language search query
        max_results: Maximum number of results to return (default 5)
        filters: Optional filters like {"department": "engineering"}
    
    Returns:
        List of documents with title, content, and relevance score
    
    Examples:
        - User asks "What's our vacation policy?" 
          → search_knowledge_base("vacation policy")
        - "Show me engineering docs about API design"
          → search_knowledge_base("API design", filters={"department": "engineering"})
    """
    pass

The LLM uses your description to decide when and how to call the function.

2. Use Strong Type Hints

Python with Pydantic:

from pydantic import BaseModel, Field
from typing import Literal

class EmailParams(BaseModel):
    recipient: str = Field(description="Email address of recipient")
    subject: str = Field(description="Email subject line")
    body: str = Field(description="Email body content")
    priority: Literal["low", "normal", "high"] = Field(
        default="normal",
        description="Email priority level"
    )

def send_email(params: EmailParams) -> dict:
    """Send an email to a recipient."""
    pass

TypeScript:

interface EmailParams {
  /** Email address of recipient */
  recipient: string;
  /** Email subject line */
  subject: string;
  /** Email body content */
  body: string;
  /** Email priority level */
  priority?: 'low' | 'normal' | 'high';
}

function sendEmail(params: EmailParams): Promise<{success: boolean}> {
  // Implementation
}

3. Validate Function Arguments

Never trust LLM-generated arguments blindly:

from pydantic import BaseModel, validator

class TransferParams(BaseModel):
    from_account: str
    to_account: str
    amount: float
    
    @validator('amount')
    def amount_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError('Amount must be positive')
        if v > 10000:
            raise ValueError('Amount exceeds transfer limit')
        return v
    
    @validator('from_account', 'to_account')
    def account_must_be_valid(cls, v):
        if not v.startswith('ACC-'):
            raise ValueError('Invalid account format')
        return v

def transfer_money(params: TransferParams):
    """Transfer money between accounts (requires validation)."""
    # Validation happens automatically via Pydantic
    execute_transfer(params)

4. Implement Function Security

Permission Checks:

def delete_user(user_id: str, requesting_user: str) -> dict:
    """
    Delete a user account (admin only).
    
    Args:
        user_id: ID of user to delete
        requesting_user: ID of user making the request
    """
    # Check permissions before executing
    if not is_admin(requesting_user):
        raise PermissionError("Only admins can delete users")
    
    # Additional confirmation for destructive actions
    if not user_exists(user_id):
        raise ValueError(f"User {user_id} not found")
    
    return perform_deletion(user_id)

Rate Limiting:

from functools import wraps
import time

def rate_limit(max_calls=10, period=60):
    """Rate limit function calls"""
    calls = []
    
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            now = time.time()
            # Remove old calls
            calls[:] = [c for c in calls if now - c < period]
            
            if len(calls) >= max_calls:
                raise Exception(f"Rate limit exceeded: {max_calls} calls per {period}s")
            
            calls.append(now)
            return func(*args, **kwargs)
        return wrapper
    return decorator

@rate_limit(max_calls=5, period=60)
def send_sms(phone: str, message: str):
    """Send SMS (rate limited to 5 per minute)"""
    pass

5. Return Structured Results

Bad:

def get_weather(city):
    return "It's sunny and 25 degrees"  # String hard for LLM to parse

Good:

from typing import TypedDict

class WeatherResult(TypedDict):
    city: str
    temperature_celsius: float
    condition: str
    humidity_percent: int
    timestamp: str

def get_weather(city: str) -> WeatherResult:
    """Get current weather for a city."""
    return {
        "city": city,
        "temperature_celsius": 25.0,
        "condition": "sunny",
        "humidity_percent": 65,
        "timestamp": "2026-03-13T01:00:00Z"
    }

LLMs handle structured JSON better than unstructured text.

6. Handle Errors Gracefully

from typing import Union

class FunctionResult(BaseModel):
    success: bool
    data: Optional[dict] = None
    error: Optional[str] = None

def safe_function_call(func, *args, **kwargs) -> FunctionResult:
    """Wrapper that catches errors and returns structured result"""
    try:
        result = func(*args, **kwargs)
        return FunctionResult(success=True, data=result)
    except PermissionError as e:
        return FunctionResult(
            success=False, 
            error=f"Permission denied: {e}"
        )
    except ValueError as e:
        return FunctionResult(
            success=False,
            error=f"Invalid input: {e}"
        )
    except Exception as e:
        logger.error(f"Function error: {e}")
        return FunctionResult(
            success=False,
            error="An unexpected error occurred"
        )

For comprehensive error handling patterns, see our AI agent error handling guide.

7. Optimize Token Usage

Function definitions consume tokens. Be concise but clear:

Verbose (wasteful):

"""
This function is used to search our company's knowledge base system.
You should call this function whenever a user asks a question that 
might be answered by our internal documentation, policies, or procedures.
The function accepts a query parameter which should be a natural language
search string, and it also accepts an optional max_results parameter...
"""

Optimized:

"""
Search company knowledge base for policies and documentation.
Args: query (str), max_results (int, default 5)
Use when: user asks about company info, policies, procedures
"""

8. Design Composable Functions

Bad — Monolithic:

def send_report_email_to_team(report_type: str):
    """Generate report, format email, send to team"""
    # Does too much, hard to debug
    pass

Good — Composable:

def generate_report(report_type: str) -> dict:
    """Generate a specific type of report"""
    pass

def format_email(subject: str, body: str) -> dict:
    """Format email content"""
    pass

def send_email(recipient: str, subject: str, body: str) -> dict:
    """Send an email"""
    pass

Let the LLM orchestrate the composition.

9. Provide Usage Examples

def calculate_roi(
    initial_investment: float,
    returns: float,
    time_period_years: float
) -> dict:
    """
    Calculate return on investment (ROI).
    
    Args:
        initial_investment: Initial amount invested
        returns: Total returns received
        time_period_years: Investment duration in years
    
    Examples:
        User: "I invested $10k and got back $15k over 3 years, what's my ROI?"
        Call: calculate_roi(10000, 15000, 3)
        
        User: "What's the ROI if I put in 5000 and earn 6500 in 2 years?"
        Call: calculate_roi(5000, 6500, 2)
    """
    roi_percent = ((returns - initial_investment) / initial_investment) * 100
    annualized_roi = ((returns / initial_investment) ** (1 / time_period_years) - 1) * 100
    
    return {
        "roi_percent": round(roi_percent, 2),
        "annualized_roi_percent": round(annualized_roi, 2),
        "profit": returns - initial_investment
    }

10. Implement Confirmation for Destructive Actions

def requires_confirmation(func):
    """Decorator for actions requiring explicit user confirmation"""
    @wraps(func)
    def wrapper(*args, **kwargs):
        confirmation_required = {
            "action": func.__name__,
            "params": kwargs,
            "requires_confirmation": True,
            "message": f"Confirm: {func.__doc__}"
        }
        # LLM should ask user for confirmation before proceeding
        return confirmation_required
    return wrapper

@requires_confirmation
def delete_all_records(table: str) -> dict:
    """Delete all records from specified table (DESTRUCTIVE)"""
    # Only executes after user confirmation
    pass

Platform-Specific Best Practices

OpenAI Function Calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. San Francisco"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Lagos?"}],
    tools=tools,
    tool_choice="auto"  # Let model decide when to use tools
)

Anthropic Tool Use

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
]

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "Weather in Lagos?"}]
)

LangChain Tools

from langchain.tools import tool

@tool
def search_knowledge_base(query: str, max_results: int = 5) -> list:
    """Search company knowledge base
    
    Args:
        query: Search query
        max_results: Number of results to return
    """
    return perform_search(query, max_results)

# LangChain auto-generates schema from decorator + docstring

For more on building agents with tools, see our multi-agent orchestration guide.

Testing Function Calling

Unit Tests

import pytest

def test_weather_function():
    result = get_weather("Lagos")
    assert result["success"] == True
    assert "temperature_celsius" in result["data"]
    assert isinstance(result["data"]["temperature_celsius"], float)

def test_weather_invalid_city():
    result = get_weather("InvalidCityXYZ123")
    assert result["success"] == False
    assert "error" in result

Integration Tests

@pytest.mark.asyncio
async def test_llm_function_calling():
    """Test that LLM correctly calls functions"""
    agent = create_agent_with_tools([get_weather])
    
    response = await agent.run("What's the weather in Lagos?")
    
    # Verify function was called
    assert agent.last_tool_call == "get_weather"
    assert agent.last_tool_args == {"city": "Lagos"}
    
    # Verify response includes weather data
    assert "temperature" in response.lower() or "weather" in response.lower()

Chaos Testing

def test_function_with_invalid_llm_args():
    """Test handling of malformed LLM-generated arguments"""
    
    invalid_inputs = [
        {"city": 12345},  # Wrong type
        {"city": ""},  # Empty string
        {},  # Missing required field
        {"city": "Lagos", "extra": "field"}  # Extra fields
    ]
    
    for invalid_input in invalid_inputs:
        result = safe_function_call(get_weather, **invalid_input)
        assert result.success == False
        assert result.error is not None

Common Pitfalls

Pitfall 1: Ambiguous Function Names

# Bad
def get():  # Get what?
def process():  # Process what?
def handle():  # Handle what?

# Good
def get_user_profile():
def process_payment():
def handle_webhook_event():

Pitfall 2: Side Effects in Descriptions

# Bad
def log_message(msg: str):
    """Logs a message (also sends email notification)"""  # Hidden side effect!
    
# Good
def log_and_notify(msg: str):
    """Logs message and sends email notification to admins"""  # Explicit

Pitfall 3: No Dry-Run Mode

def send_email(to: str, subject: str, body: str, dry_run: bool = False):
    """Send email (supports dry-run for testing)"""
    if dry_run:
        return {"success": True, "message": "Dry run - email not sent"}
    return actually_send_email(to, subject, body)

Conclusion

Production function calling LLM best practices require careful schema design, robust validation, security guardrails, and comprehensive testing. The difference between a demo and a production system lies in the details.

Key takeaways:

Write clear, concise function descriptions with examples
Use strong typing and validation
Implement security checks and rate limiting
Return structured results
Handle errors gracefully
Test with real and malformed inputs

Start with a small set of well-designed functions, validate thoroughly, then expand your tool ecosystem.

Build AI That Works For Your Business

At AI Agents Plus, we help companies move from AI experiments to production systems that deliver real ROI. Whether you need:

Custom AI Agents — Autonomous systems that handle complex workflows, from customer service to operations
Rapid AI Prototyping — Go from idea to working demo in days using vibe coding and modern AI frameworks
Voice AI Solutions — Natural conversational interfaces for your products and services

We've built AI systems for startups and enterprises across Africa and beyond.

Ready to explore what AI can do for your business? Let's talk →

Function Calling LLM Best Practices: Production Guide for 2026

Function Calling LLM Best Practices: Production Guide for 2026

What Is Function Calling in LLMs?

Why Function Calling Matters

Function Calling Best Practices

1. Write Clear Function Descriptions

2. Use Strong Type Hints

3. Validate Function Arguments

4. Implement Function Security

5. Return Structured Results

6. Handle Errors Gracefully

7. Optimize Token Usage

8. Design Composable Functions

9. Provide Usage Examples

10. Implement Confirmation for Destructive Actions

Platform-Specific Best Practices

OpenAI Function Calling

Anthropic Tool Use

LangChain Tools

Testing Function Calling

Unit Tests

Integration Tests

Chaos Testing

Common Pitfalls

Pitfall 1: Ambiguous Function Names

Pitfall 2: Side Effects in Descriptions

Pitfall 3: No Dry-Run Mode

Conclusion

Build AI That Works For Your Business

About AI Agents Plus Editorial

Related Posts

LLM Agent Telemetry Signals and Monitoring Best Practices

LangChain vs AutoGen 2026: Choosing the Right Framework for Multi-Agent Systems

LangChain vs LlamaIndex vs Semantic Kernel: Complete Framework Comparison 2026

Ready to Transform Your Business with AI?