Spaces:
Configuration error
title: v1.68.0-stable
slug: v1.68.0-stable
date: 2025-05-03T10:00:00.000Z
authors:
- name: Krrish Dholakia
title: CEO, LiteLLM
url: https://www.linkedin.com/in/krish-d/
image_url: >-
https://media.licdn.com/dms/image/v2/D4D03AQGrlsJ3aqpHmQ/profile-displayphoto-shrink_400_400/B4DZSAzgP7HYAg-/0/1737327772964?e=1749686400&v=beta&t=Hkl3U8Ps0VtvNxX0BNNq24b4dtX5wQaPFp6oiKCIHD8
- name: Ishaan Jaffer
title: CTO, LiteLLM
url: https://www.linkedin.com/in/reffajnaahsi/
image_url: >-
https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
hide_table_of_contents: false
import Image from '@theme/IdealImage'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
Deploy this version
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.68.0-stable
pip install litellm==1.68.0.post1
Key Highlights
LiteLLM v1.68.0-stable will be live soon. Here are the key highlights of this release:
- Bedrock Knowledge Base: You can now call query your Bedrock Knowledge Base with all LiteLLM models via
/chat/completion
or/responses
API. - Rate Limits: This release brings accurate rate limiting across multiple instances, reducing spillover to at most 10 additional requests in high traffic.
- Meta Llama API: Added support for Meta Llama API Get Started
- LlamaFile: Added support for LlamaFile Get Started
Bedrock Knowledge Base (Vector Store)
<Image img={require('../../img/release_notes/bedrock_kb.png')}/>
This release adds support for Bedrock vector stores (knowledge bases) in LiteLLM. With this update, you can:
- Use Bedrock vector stores in the OpenAI /chat/completions spec with all LiteLLM supported models.
- View all available vector stores through the LiteLLM UI or API.
- Configure vector stores to be always active for specific models.
- Track vector store usage in LiteLLM Logs.
For the next release we plan on allowing you to set key, user, team, org permissions for vector stores.
Rate Limiting
<Image img={require('../../img/multi_instance_rate_limiting.png')}/>
This release brings accurate multi-instance rate limiting across keys/users/teams. Outlining key engineering changes below:
- Change: Instances now increment cache value instead of setting it. To avoid calling Redis on each request, this is synced every 0.01s.
- Accuracy: In testing, we saw a maximum spill over from expected of 10 requests, in high traffic (100 RPS, 3 instances), vs. current 189 request spillover
- Performance: Our load tests show this to reduce median response time by 100ms in high traffic
This is currently behind a feature flag, and we plan to have this be the default by next week. To enable this today, just add this environment variable:
export LITELLM_RATE_LIMIT_ACCURACY=true
New Models / Updated Models
- Gemini (VertexAI + Google AI Studio)
- VertexAI
- Bedrock
- Image Generation - Support new ‘stable-image-core’ models - PR
- Knowledge Bases - support using Bedrock knowledge bases with
/chat/completions
PR - Anthropic - add ‘supports_pdf_input’ for claude-3.7-bedrock models PR, Get Started
- OpenAI
- 🆕 Meta Llama API provider PR
- 🆕 LlamaFile provider PR
LLM API Endpoints
- Response API
- Fix for handling multi turn sessions PR
- Embeddings
- Caching fixes - PR
- handle str -> list cache
- Return usage tokens for cache hit
- Combine usage tokens on partial cache hits
- Caching fixes - PR
- 🆕 Vector Stores
- MCP
- Moderations
- Add logging callback support for
/moderations
API - PR
- Add logging callback support for
Spend Tracking / Budget Improvements
- OpenAI
- computer-use-preview cost tracking / pricing PR
- gpt-4o-mini-tts input cost tracking - PR
- Fireworks AI - pricing updates - new
0-4b
model pricing tier + llama4 model pricing - Budgets
- Budget resets now happen as start of day/week/month - PR
- Trigger Soft Budget Alerts When Key Crosses Threshold - PR
- Token Counting
- Rewrite of token_counter() function to handle to prevent undercounting tokens - PR
Management Endpoints / UI
- Virtual Keys
- Models
- Teams
- Allow reassigning team to other org - PR
- Organizations
- Fix showing org budget on table - PR
Logging / Guardrail Integrations
- Langsmith
- Respect langsmith_batch_size param - PR
Performance / Loadbalancing / Reliability improvements
- Redis
- Ensure all redis queues are periodically flushed, this fixes an issue where redis queue size was growing indefinitely when request tags were used - PR
- Rate Limits
- Multi-instance rate limiting support across keys/teams/users/customers - PR, PR, PR
- Azure OpenAI OIDC
General Proxy Improvements
- Security
- Allow blocking web crawlers - PR
- Auth
- Support
x-litellm-api-key
header param by default, this fixes an issue from the prior release wherex-litellm-api-key
was not being used on vertex ai passthrough requests - PR - Allow key at max budget to call non-llm api endpoints - PR
- Support
- 🆕 Python Client Library for LiteLLM Proxy management endpoints
- Dependencies
- Don’t require uvloop for windows - PR