Function Calling LLM Best Practices: Production Guide 2026

Function calling transforms language models from text generators into capable agents that interact with external systems, databases, APIs, and tools. Mastering function calling LLM best practices is essential for building production AI agents that reliably execute real-world tasks beyond simple conversation.

Modern LLMs from OpenAI, Anthropic, and Google support sophisticated function calling, but production success requires careful tool design, robust parameter validation, comprehensive error handling, and thoughtful orchestration. This guide covers the patterns that separate prototype demos from production-grade AI agents.

What is Function Calling in LLMs?

Function calling (also called tool use or tool calling) enables LLMs to invoke external functions with structured parameters rather than just generating text. When a user asks "What's the weather in Lagos?", an LLM with function calling:

Recognizes it needs external data
Generates a function call: get_weather(location="Lagos")
Your code executes the function
The LLM incorporates results into its response

Function calling enables AI agents to:

Retrieve real-time data: Weather, stock prices, database queries
Execute actions: Send emails, create calendar events, update records
Access internal systems: CRM data, inventory, analytics
Perform calculations: Complex math, data analysis
Chain multi-step workflows: Orchestrate sequences of operations

Why Function Calling Best Practices Matter

Poor function calling implementation leads to:

Unreliable execution: Invalid parameters cause failures
Security vulnerabilities: Unchecked tool access enables exploits
Cost overruns: Inefficient calling patterns waste tokens
Poor UX: Failed tool calls frustrate users
Maintenance burden: Fragile integrations break frequently

Production-grade function calling requires systematic tool design, validation, error handling, and monitoring to ensure reliability and security.

Core Function Calling Best Practices

1. Design Clear, Focused Tool Functions

Good tool design:

# Clear, single-purpose function
def search_products(query: str, max_results: int = 10) -> list:
    """Search product catalog by keyword.
    
    Args:
        query: Search keywords
        max_results: Maximum number of results (1-50)
    
    Returns:
        List of matching products with name, price, availability
    """
    pass

Poor tool design:

# Vague, multi-purpose function
def handle_request(action: str, data: dict) -> dict:
    """Handle various types of requests."""
    # Too generic, forces LLM to guess structure
    pass

2. Write Excellent Tool Descriptions

The LLM relies entirely on your descriptions:

{
    "name": "get_weather",
    "description": (
        "Get current weather conditions for a specific location. "
        "Use this when users ask about weather, temperature, or "
        "atmospheric conditions. Returns temperature, conditions, "
        "humidity, and wind speed."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City name or 'City, Country' format (e.g., 'Lagos, Nigeria')"
            },
            "units": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit (default: celsius)"
            }
        },
        "required": ["location"]
    }
}

3. Validate Tool Parameters Rigorously

Never trust LLM-generated parameters without validation:

from pydantic import BaseModel, Field, validator

class WeatherParams(BaseModel):
    location: str = Field(..., min_length=1, max_length=100)
    units: str = Field(default="celsius", pattern="^(celsius|fahrenheit)$")
    
    @validator('location')
    def validate_location(cls, v):
        if not v.strip():
            raise ValueError("Location cannot be empty")
        # Additional sanitization
        return v.strip()

def execute_tool(tool_name: str, params: dict):
    try:
        validated = WeatherParams(**params)
        return tools[tool_name](validated)
    except ValidationError as e:
        return {"error": str(e), "type": "validation_error"}

4. Implement Robust Error Handling

Tools fail for many reasons—handle gracefully:

async def safe_tool_execution(tool_func, params, max_retries=2):
    for attempt in range(max_retries):
        try:
            result = await tool_func(**params)
            return {"success": True, "data": result}
        except ValidationError as e:
            # Don't retry validation errors
            return {
                "success": False,
                "error": "Invalid parameters",
                "details": str(e),
                "retry": False
            }
        except TimeoutError:
            if attempt < max_retries - 1:
                await asyncio.sleep(2 ** attempt)
                continue
            return {
                "success": False,
                "error": "Tool execution timed out",
                "retry": True
            }
        except Exception as e:
            logger.error(f"Tool error: {e}", exc_info=True)
            return {
                "success": False,
                "error": "Internal tool error",
                "retry": False
            }

For comprehensive error handling patterns, see our AI agent error handling guide.

5. Return Structured, Informative Results

Help the LLM use results effectively:

# Good: Structured, clear result
def search_products(query: str) -> dict:
    return {
        "query": query,
        "results_count": 3,
        "products": [
            {
                "name": "Product A",
                "price": 29.99,
                "in_stock": True,
                "url": "https://..."
            }
        ]
    }

# Poor: Unstructured string
def search_products(query: str) -> str:
    return "Found Product A for $29.99 and Product B..."

6. Implement Security Controls

Never expose dangerous operations without safeguards:

class SecureToolExecutor:
    def __init__(self, allowed_tools: set, user_permissions: dict):
        self.allowed_tools = allowed_tools
        self.user_permissions = user_permissions
    
    async def execute(self, user_id: str, tool_name: str, params: dict):
        # Check tool allowlist
        if tool_name not in self.allowed_tools:
            return {"error": "Tool not allowed"}
        
        # Check user permissions
        if not self.user_permissions.get(user_id, {}).get(tool_name):
            return {"error": "Insufficient permissions"}
        
        # Rate limiting
        if await self.is_rate_limited(user_id, tool_name):
            return {"error": "Rate limit exceeded"}
        
        # Execute with timeout and resource limits
        return await self.execute_with_limits(tool_name, params)

7. Optimize Token Usage

Function definitions consume tokens—optimize carefully:

# Only include relevant tools for each conversation
def select_tools(conversation_context: str) -> list:
    if "weather" in conversation_context.lower():
        return [weather_tool]
    elif "product" in conversation_context.lower():
        return [search_tool, inventory_tool]
    else:
        return [basic_tools]

Advanced Function Calling Patterns

Multi-Step Tool Chains

Let LLMs orchestrate complex workflows:

# Example: "Find cheapest flight to Lagos and book it"
# LLM chains:
# 1. search_flights(destination="Lagos")
# 2. compare_prices(flight_ids=[...])
# 3. book_flight(flight_id="cheapest_id")
# 4. send_confirmation_email(booking_id=...)

For multi-agent orchestration, see our orchestration best practices.

Parallel Tool Execution

When tools don't depend on each other:

async def parallel_tool_execution(tool_calls: list):
    tasks = [
        execute_tool(call['name'], call['params'])
        for call in tool_calls
    ]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return results

Conditional Tool Availability

Offer different tools based on context:

if user.subscription_tier == "premium":
    available_tools.extend([advanced_analytics_tool])
if conversation.is_escalated:
    available_tools.append(human_handoff_tool)

Tool Result Caching

Cache expensive or rate-limited operations:

from functools import lru_cache
import hashlib

class CachedToolExecutor:
    def __init__(self, cache_ttl=300):
        self.cache = {}
    
    async def execute(self, tool_name: str, params: dict):
        cache_key = f"{tool_name}:{hashlib.md5(str(params).encode()).hexdigest()}"
        
        if cache_key in self.cache:
            cached_result, timestamp = self.cache[cache_key]
            if time.time() - timestamp < self.cache_ttl:
                return cached_result
        
        result = await actual_tool_execution(tool_name, params)
        self.cache[cache_key] = (result, time.time())
        return result

Testing Function Calling

Systematic testing is critical:

Unit Tests for Tools

import pytest

@pytest.mark.asyncio
async def test_weather_tool():
    result = await get_weather(location="Lagos")
    assert result["temperature"] is not None
    assert result["conditions"] in ["sunny", "cloudy", "rainy"]

@pytest.mark.asyncio
async def test_invalid_location():
    result = await get_weather(location="")
    assert result["error"] == "Invalid parameters"

Integration Tests for Calling Patterns

@pytest.mark.asyncio
async def test_multi_step_workflow():
    # Test LLM can chain tools correctly
    response = await agent.run(
        "Find weather in Lagos and send it to user@example.com"
    )
    assert "get_weather" in response.tool_calls
    assert "send_email" in response.tool_calls

Explore comprehensive testing in our AI agent testing guide.

Production Monitoring

Track function calling metrics:

Tool call success rate by tool
Average execution time per tool
Parameter validation failure rate
Retry attempts per tool
Cost per tool call

import structlog

logger = structlog.get_logger()

async def monitored_tool_execution(tool_name, params):
    start_time = time.time()
    try:
        result = await execute_tool(tool_name, params)
        logger.info(
            "tool_execution_success",
            tool=tool_name,
            duration=time.time() - start_time,
            param_count=len(params)
        )
        return result
    except Exception as e:
        logger.error(
            "tool_execution_failed",
            tool=tool_name,
            error=str(e),
            params=params
        )
        raise

Provider-Specific Best Practices

OpenAI Function Calling

response = openai.chat.completions.create(
    model="gpt-4",
    messages=messages,
    tools=[
        {
            "type": "function",
            "function": weather_tool_spec
        }
    ],
    tool_choice="auto"  # or "required" or {"type": "function", "function": {"name": "..."}}
)

Anthropic Tool Use (Claude)

response = anthropic.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    tools=[weather_tool_spec],
    messages=messages
)

# Handle tool use blocks
for block in response.content:
    if block.type == "tool_use":
        result = execute_tool(block.name, block.input)
        # Continue conversation with tool result

Google Gemini Function Calling

model = genai.GenerativeModel(
    model_name="gemini-pro",
    tools=[weather_tool_spec]
)

response = model.generate_content(
    messages,
    tool_config={"function_calling_config": {"mode": "AUTO"}}
)

Common Mistakes to Avoid

Vague Tool Descriptions

LLMs can't read your mind. Describe exactly when and how to use each tool.

Trusting LLM Parameters

Always validate. LLMs make mistakes with parameters.

Synchronous Tool Execution

Use async execution to prevent blocking:

# Bad: Blocks on each tool
result1 = execute_tool_sync(tool1, params1)
result2 = execute_tool_sync(tool2, params2)

# Good: Parallel execution
result1, result2 = await asyncio.gather(
    execute_tool(tool1, params1),
    execute_tool(tool2, params2)
)

Missing Timeout Protection

Always set timeouts on external calls:

try:
    result = await asyncio.wait_for(
        execute_tool(name, params),
        timeout=10.0
    )
except asyncio.TimeoutError:
    return {"error": "Tool execution timed out"}

Over-Exposing Tools

Start with minimal tool sets. Add tools only when needed.

Poor Error Communication

Return structured errors the LLM can explain:

{
    "error": "Product not found",
    "suggestion": "Try searching with different keywords",
    "retry": True
}

Production Deployment Checklist

The Future of Function Calling

Emerging trends:

Autonomous tool discovery: LLMs learn available tools dynamically
Self-correcting calls: LLMs detect and fix parameter errors
Semantic tool matching: Match tools by capability, not exact name
Streaming tool results: Progressive result delivery for long operations
Multi-modal tools: Tools that accept/return images, audio, etc.

Conclusion

Function calling LLM best practices separate toy demos from production AI agents. By designing clear, focused tools with excellent descriptions, validating parameters rigorously, implementing robust error handling, optimizing token usage, and monitoring execution carefully, teams build reliable AI agents that safely interact with real-world systems.

Production function calling requires systematic tool design, comprehensive validation, intelligent error recovery, and continuous monitoring. With proper implementation, function calling transforms LLMs from conversational toys into capable agents that deliver real business value.

Build AI That Works For Your Business

At AI Agents Plus, we help companies move from AI experiments to production systems that deliver real ROI. Whether you need:

Custom AI Agents — Autonomous systems that handle complex workflows, from customer service to operations
Rapid AI Prototyping — Go from idea to working demo in days using vibe coding and modern AI frameworks
Voice AI Solutions — Natural conversational interfaces for your products and services

We've built AI systems for startups and enterprises across Africa and beyond.

Ready to explore what AI can do for your business? Let's talk →