v1.76.3-stable - Performance, Video Generation & CloudZero Integration
warning
This release has a known issue where startup is leading to Out of Memory errors when deploying on Kubernetes. We recommend waiting before upgrading to this version.
Deploy this versionโ
- Docker
- Pip
docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.76.3
pip install litellm
pip install litellm==1.76.3
Key Highlightsโ
- Major Performance Improvements +400 RPS when using correct amount of workers + CPU cores combination
- Video Generation Support - Added Google AI Studio and Vertex AI Veo Video Generation through LiteLLM Pass through routes
- CloudZero Integration - New cost tracking integration for exporting LiteLLM Usage and Spend data to CloudZero.
Major Changesโ
- 
Performance Optimization: LiteLLM Proxy now achieves +400 RPS when using correct amount of CPU cores - PR #14153, PR #14242 By default, LiteLLM will now use num_workers = os.cpu_count()to achieve optimal performance.Override Options: Set environment variable: DEFAULT_NUM_WORKERS_LITELLM_PROXY=1Or start LiteLLM Proxy with: litellm --num_workers 1
- 
Security Fix: Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229 
Performance Improvementsโ
This release includes significant performance optimizations. On our internal benchmarks we saw 1 instance get +400 RPS when using correct amount of workers + CPU cores combination.
- +400 RPS Performance Boost - LiteLLM Proxy now uses correct amount of CPU cores for optimal performance - PR #14153
- Default CPU Workers - Changed DEFAULT_NUM_WORKERS_LITELLM_PROXY default to number of CPUs - PR #14242
New Models / Updated Modelsโ
New Model Supportโ
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features | 
|---|---|---|---|---|---|
| OpenRouter | openrouter/openai/gpt-4.1 | 1M | $2.00 | $8.00 | Chat completions with vision | 
| OpenRouter | openrouter/openai/gpt-4.1-mini | 1M | $0.40 | $1.60 | Efficient chat completions | 
| OpenRouter | openrouter/openai/gpt-4.1-nano | 1M | $0.10 | $0.40 | Ultra-efficient chat | 
| Vertex AI | vertex_ai/openai/gpt-oss-20b-maas | 131K | $0.075 | $0.30 | Reasoning support | 
| Vertex AI | vertex_ai/openai/gpt-oss-120b-maas | 131K | $0.15 | $0.60 | Advanced reasoning | 
| Gemini | gemini/veo-3.0-generate-preview | 1K | - | $0.75/sec | Video generation | 
| Gemini | gemini/veo-3.0-fast-generate-preview | 1K | - | $0.40/sec | Fast video generation | 
| Gemini | gemini/veo-2.0-generate-001 | 1K | - | $0.35/sec | Video generation | 
| Volcengine | doubao-embedding-large | 4K | Free | Free | 2048-dim embeddings | 
| Together AI | together_ai/deepseek-ai/DeepSeek-V3.1 | 128K | $0.60 | $1.70 | Reasoning support | 
Featuresโ
- Google Gemini
- OpenRouter
- Added GPT-4.1 model family - PR #14101
 
- Groq
- Added support for reasoning_effort parameter - PR #14207
 
- X.AI
- Fixed XAI cost calculation - PR #14127
 
- Vertex AI
- VLLM
- Handle output parsing responses API output - PR #14121
 
- Ollama
- Added unified 'thinking' param support via reasoning_content- PR #14121
 
- Added unified 'thinking' param support via 
- Anthropic
- Added supported text field to anthropic citation response - PR #14126
 
- OCI Provider
- Handle assistant messages with both content and tool_calls - PR #14171
 
- Bedrock
- Databricks
- Added support for anthropic citation API in Databricks - PR #14077
 
Bug Fixesโ
New Provider Supportโ
- Volcengine
- Added Volcengine embedding module with handler and transformation logic - PR #14028
 
LLM API Endpointsโ
Featuresโ
- Images API
- Responses API
- Bedrock Passthrough
- Support AWS_BEDROCK_RUNTIME_ENDPOINT on bedrock passthrough - PR #14156
 
- Google AI Studio Passthrough
- Allow using Veo Video Generation through LiteLLM Pass through routes - PR #14228
 
- General
Bugsโ
- General
Spend Tracking, Budgets and Rate Limitingโ
Featuresโ
- Added header support for spend_logs_metadata - PR #14186
- Litellm passthrough cost tracking for chat completion - PR #14256
Bug Fixesโ
Management Endpoints / UIโ
Featuresโ
- UI Improvements
- Logs page screen size fixed - PR #14135
- Create Organization Tooltip added on Success - PR #14132
- Back to Keys should say Back to Logs - PR #14134
- Add client side pagination on All Models table - PR #14136
- Model Filters UI improvement - PR #14131
- Remove table filter on user info page - PR #14169
- Team name badge added on the User Details - PR #14003
- Fix: Log page parameter passing error - PR #14193
 
- Authentication & Authorization
Bugsโ
- General
- Validate store model in db setting - PR #14269
 
Logging / Guardrail Integrationsโ
Featuresโ
- Datadog
- Ensure apm_idis set on DD LLM Observability traces - PR #14272
 
- Ensure 
- Braintrust
- Fix logging when OTEL is enabled - PR #14122
 
- OTEL
- Optional Metrics and Logs following semantic conventions - PR #14179
 
- Slack Alerting
- Added alert type to alert message to slack for easier handling - PR #14176
 
Guardrailsโ
- Added guardrail to the Anthropic API endpoint - PR #14107
New Integrationโ
Performance / Loadbalancing / Reliability improvementsโ
Featuresโ
- Performance
- Monitoring
- Added Prometheus missing metrics - PR #14139
 
- Timeout
- Stream Timeout Control - Allow using x-litellm-stream-timeoutheader for stream timeout in requests - PR #14147
 
- Stream Timeout Control - Allow using 
- Routing
- Fixed x-litellm-tags not routing with Responses API - PR #14289
 
Bugsโ
- Security
- Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229
 
General Proxy Improvementsโ
Featuresโ
- SCIM Support
- Kubernetes
- Added optional PodDisruptionBudget for litellm proxy - PR #14093
 
- Error Handling
- Add model to azure error message - PR #14294
 
New Contributorsโ
- @iabhi4 made their first contribution in PR #14093
- @zainhas made their first contribution in PR #14087
- @LifeDJIK made their first contribution in PR #14146
- @retanoj made their first contribution in PR #14133
- @zhxlp made their first contribution in PR #14193
- @kayoch1n made their first contribution in PR #14191
- @kutsushitaneko made their first contribution in PR #14171
- @mjmendo made their first contribution in PR #14176
- @HarshavardhanK made their first contribution in PR #14213
- @eycjur made their first contribution in PR #14207
- @22mSqRi made their first contribution in PR #14241
- @onlylhf made their first contribution in PR #14028
- @btpemercier made their first contribution in PR #11319
- @tremlin made their first contribution in PR #14287
- @TobiMayr made their first contribution in PR #14262
- @Eitan1112 made their first contribution in PR #14252

