Function Calling LLM Best Practices: Production Implementation Guide 2026
Master function calling for production LLM applications. Learn best practices for tool design, parameter validation, error handling, and building reliable AI agents with external capabilities.

Function calling transforms language models from text generators into capable agents that interact with external systems, databases, APIs, and tools. Mastering function calling LLM best practices is essential for building production AI agents that reliably execute real-world tasks beyond simple conversation.
Modern LLMs from OpenAI, Anthropic, and Google support sophisticated function calling, but production success requires careful tool design, robust parameter validation, comprehensive error handling, and thoughtful orchestration. This guide covers the patterns that separate prototype demos from production-grade AI agents.
What is Function Calling in LLMs?
Function calling (also called tool use or tool calling) enables LLMs to invoke external functions with structured parameters rather than just generating text. When a user asks "What's the weather in Lagos?", an LLM with function calling:
- Recognizes it needs external data
- Generates a function call:
get_weather(location="Lagos") - Your code executes the function
- The LLM incorporates results into its response
Function calling enables AI agents to:
- Retrieve real-time data: Weather, stock prices, database queries
- Execute actions: Send emails, create calendar events, update records
- Access internal systems: CRM data, inventory, analytics
- Perform calculations: Complex math, data analysis
- Chain multi-step workflows: Orchestrate sequences of operations
Why Function Calling Best Practices Matter
Poor function calling implementation leads to:
- Unreliable execution: Invalid parameters cause failures
- Security vulnerabilities: Unchecked tool access enables exploits
- Cost overruns: Inefficient calling patterns waste tokens
- Poor UX: Failed tool calls frustrate users
- Maintenance burden: Fragile integrations break frequently
Production-grade function calling requires systematic tool design, validation, error handling, and monitoring to ensure reliability and security.

Core Function Calling Best Practices
1. Design Clear, Focused Tool Functions
Good tool design:
# Clear, single-purpose function
def search_products(query: str, max_results: int = 10) -> list:
"""Search product catalog by keyword.
Args:
query: Search keywords
max_results: Maximum number of results (1-50)
Returns:
List of matching products with name, price, availability
"""
pass
Poor tool design:
# Vague, multi-purpose function
def handle_request(action: str, data: dict) -> dict:
"""Handle various types of requests."""
# Too generic, forces LLM to guess structure
pass
2. Write Excellent Tool Descriptions
The LLM relies entirely on your descriptions:
{
"name": "get_weather",
"description": (
"Get current weather conditions for a specific location. "
"Use this when users ask about weather, temperature, or "
"atmospheric conditions. Returns temperature, conditions, "
"humidity, and wind speed."
),
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name or 'City, Country' format (e.g., 'Lagos, Nigeria')"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit (default: celsius)"
}
},
"required": ["location"]
}
}
3. Validate Tool Parameters Rigorously
Never trust LLM-generated parameters without validation:
from pydantic import BaseModel, Field, validator
class WeatherParams(BaseModel):
location: str = Field(..., min_length=1, max_length=100)
units: str = Field(default="celsius", pattern="^(celsius|fahrenheit)$")
@validator('location')
def validate_location(cls, v):
if not v.strip():
raise ValueError("Location cannot be empty")
# Additional sanitization
return v.strip()
def execute_tool(tool_name: str, params: dict):
try:
validated = WeatherParams(**params)
return tools[tool_name](validated)
except ValidationError as e:
return {"error": str(e), "type": "validation_error"}
4. Implement Robust Error Handling
Tools fail for many reasons—handle gracefully:
async def safe_tool_execution(tool_func, params, max_retries=2):
for attempt in range(max_retries):
try:
result = await tool_func(**params)
return {"success": True, "data": result}
except ValidationError as e:
# Don't retry validation errors
return {
"success": False,
"error": "Invalid parameters",
"details": str(e),
"retry": False
}
except TimeoutError:
if attempt < max_retries - 1:
await asyncio.sleep(2 ** attempt)
continue
return {
"success": False,
"error": "Tool execution timed out",
"retry": True
}
except Exception as e:
logger.error(f"Tool error: {e}", exc_info=True)
return {
"success": False,
"error": "Internal tool error",
"retry": False
}
For comprehensive error handling patterns, see our AI agent error handling guide.
5. Return Structured, Informative Results
Help the LLM use results effectively:
# Good: Structured, clear result
def search_products(query: str) -> dict:
return {
"query": query,
"results_count": 3,
"products": [
{
"name": "Product A",
"price": 29.99,
"in_stock": True,
"url": "https://..."
}
]
}
# Poor: Unstructured string
def search_products(query: str) -> str:
return "Found Product A for $29.99 and Product B..."
6. Implement Security Controls
Never expose dangerous operations without safeguards:
class SecureToolExecutor:
def __init__(self, allowed_tools: set, user_permissions: dict):
self.allowed_tools = allowed_tools
self.user_permissions = user_permissions
async def execute(self, user_id: str, tool_name: str, params: dict):
# Check tool allowlist
if tool_name not in self.allowed_tools:
return {"error": "Tool not allowed"}
# Check user permissions
if not self.user_permissions.get(user_id, {}).get(tool_name):
return {"error": "Insufficient permissions"}
# Rate limiting
if await self.is_rate_limited(user_id, tool_name):
return {"error": "Rate limit exceeded"}
# Execute with timeout and resource limits
return await self.execute_with_limits(tool_name, params)
7. Optimize Token Usage
Function definitions consume tokens—optimize carefully:
# Only include relevant tools for each conversation
def select_tools(conversation_context: str) -> list:
if "weather" in conversation_context.lower():
return [weather_tool]
elif "product" in conversation_context.lower():
return [search_tool, inventory_tool]
else:
return [basic_tools]
Advanced Function Calling Patterns
Multi-Step Tool Chains
Let LLMs orchestrate complex workflows:
# Example: "Find cheapest flight to Lagos and book it"
# LLM chains:
# 1. search_flights(destination="Lagos")
# 2. compare_prices(flight_ids=[...])
# 3. book_flight(flight_id="cheapest_id")
# 4. send_confirmation_email(booking_id=...)
For multi-agent orchestration, see our orchestration best practices.
Parallel Tool Execution
When tools don't depend on each other:
async def parallel_tool_execution(tool_calls: list):
tasks = [
execute_tool(call['name'], call['params'])
for call in tool_calls
]
results = await asyncio.gather(*tasks, return_exceptions=True)
return results
Conditional Tool Availability
Offer different tools based on context:
if user.subscription_tier == "premium":
available_tools.extend([advanced_analytics_tool])
if conversation.is_escalated:
available_tools.append(human_handoff_tool)
Tool Result Caching
Cache expensive or rate-limited operations:
from functools import lru_cache
import hashlib
class CachedToolExecutor:
def __init__(self, cache_ttl=300):
self.cache = {}
async def execute(self, tool_name: str, params: dict):
cache_key = f"{tool_name}:{hashlib.md5(str(params).encode()).hexdigest()}"
if cache_key in self.cache:
cached_result, timestamp = self.cache[cache_key]
if time.time() - timestamp < self.cache_ttl:
return cached_result
result = await actual_tool_execution(tool_name, params)
self.cache[cache_key] = (result, time.time())
return result
Testing Function Calling
Systematic testing is critical:
Unit Tests for Tools
import pytest
@pytest.mark.asyncio
async def test_weather_tool():
result = await get_weather(location="Lagos")
assert result["temperature"] is not None
assert result["conditions"] in ["sunny", "cloudy", "rainy"]
@pytest.mark.asyncio
async def test_invalid_location():
result = await get_weather(location="")
assert result["error"] == "Invalid parameters"
Integration Tests for Calling Patterns
@pytest.mark.asyncio
async def test_multi_step_workflow():
# Test LLM can chain tools correctly
response = await agent.run(
"Find weather in Lagos and send it to user@example.com"
)
assert "get_weather" in response.tool_calls
assert "send_email" in response.tool_calls
Explore comprehensive testing in our AI agent testing guide.
Production Monitoring
Track function calling metrics:
- Tool call success rate by tool
- Average execution time per tool
- Parameter validation failure rate
- Retry attempts per tool
- Cost per tool call
import structlog
logger = structlog.get_logger()
async def monitored_tool_execution(tool_name, params):
start_time = time.time()
try:
result = await execute_tool(tool_name, params)
logger.info(
"tool_execution_success",
tool=tool_name,
duration=time.time() - start_time,
param_count=len(params)
)
return result
except Exception as e:
logger.error(
"tool_execution_failed",
tool=tool_name,
error=str(e),
params=params
)
raise
Provider-Specific Best Practices
OpenAI Function Calling
response = openai.chat.completions.create(
model="gpt-4",
messages=messages,
tools=[
{
"type": "function",
"function": weather_tool_spec
}
],
tool_choice="auto" # or "required" or {"type": "function", "function": {"name": "..."}}
)
Anthropic Tool Use (Claude)
response = anthropic.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
tools=[weather_tool_spec],
messages=messages
)
# Handle tool use blocks
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
# Continue conversation with tool result
Google Gemini Function Calling
model = genai.GenerativeModel(
model_name="gemini-pro",
tools=[weather_tool_spec]
)
response = model.generate_content(
messages,
tool_config={"function_calling_config": {"mode": "AUTO"}}
)
Common Mistakes to Avoid
Vague Tool Descriptions
LLMs can't read your mind. Describe exactly when and how to use each tool.
Trusting LLM Parameters
Always validate. LLMs make mistakes with parameters.
Synchronous Tool Execution
Use async execution to prevent blocking:
# Bad: Blocks on each tool
result1 = execute_tool_sync(tool1, params1)
result2 = execute_tool_sync(tool2, params2)
# Good: Parallel execution
result1, result2 = await asyncio.gather(
execute_tool(tool1, params1),
execute_tool(tool2, params2)
)
Missing Timeout Protection
Always set timeouts on external calls:
try:
result = await asyncio.wait_for(
execute_tool(name, params),
timeout=10.0
)
except asyncio.TimeoutError:
return {"error": "Tool execution timed out"}
Over-Exposing Tools
Start with minimal tool sets. Add tools only when needed.
Poor Error Communication
Return structured errors the LLM can explain:
{
"error": "Product not found",
"suggestion": "Try searching with different keywords",
"retry": True
}
Production Deployment Checklist
- All tools have clear, detailed descriptions
- Parameter validation with Pydantic or similar
- Error handling with retries for transient failures
- Timeouts on all external calls
- Security controls (allowlists, permissions, rate limits)
- Structured error responses
- Comprehensive logging and monitoring
- Unit tests for each tool
- Integration tests for common workflows
- Cost tracking per tool
- Documentation for tool maintenance
The Future of Function Calling
Emerging trends:
- Autonomous tool discovery: LLMs learn available tools dynamically
- Self-correcting calls: LLMs detect and fix parameter errors
- Semantic tool matching: Match tools by capability, not exact name
- Streaming tool results: Progressive result delivery for long operations
- Multi-modal tools: Tools that accept/return images, audio, etc.
Conclusion
Function calling LLM best practices separate toy demos from production AI agents. By designing clear, focused tools with excellent descriptions, validating parameters rigorously, implementing robust error handling, optimizing token usage, and monitoring execution carefully, teams build reliable AI agents that safely interact with real-world systems.
Production function calling requires systematic tool design, comprehensive validation, intelligent error recovery, and continuous monitoring. With proper implementation, function calling transforms LLMs from conversational toys into capable agents that deliver real business value.
Build AI That Works For Your Business
At AI Agents Plus, we help companies move from AI experiments to production systems that deliver real ROI. Whether you need:
- Custom AI Agents — Autonomous systems that handle complex workflows, from customer service to operations
- Rapid AI Prototyping — Go from idea to working demo in days using vibe coding and modern AI frameworks
- Voice AI Solutions — Natural conversational interfaces for your products and services
We've built AI systems for startups and enterprises across Africa and beyond.
Ready to explore what AI can do for your business? Let's talk →
About AI Agents Plus Editorial
AI automation expert and thought leader in business transformation through artificial intelligence.



