DesertWolf's picture
Upload folder using huggingface_hub
447ebeb verified
metadata
title: v1.70.1-stable - Gemini Realtime API Support
slug: v1.70.1-stable
date: 2025-05-17T10:00:00.000Z
authors:
  - name: Krrish Dholakia
    title: CEO, LiteLLM
    url: https://www.linkedin.com/in/krish-d/
    image_url: >-
      https://media.licdn.com/dms/image/v2/D4D03AQGrlsJ3aqpHmQ/profile-displayphoto-shrink_400_400/B4DZSAzgP7HYAg-/0/1737327772964?e=1749686400&v=beta&t=Hkl3U8Ps0VtvNxX0BNNq24b4dtX5wQaPFp6oiKCIHD8
  - name: Ishaan Jaffer
    title: CTO, LiteLLM
    url: https://www.linkedin.com/in/reffajnaahsi/
    image_url: >-
      https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
hide_table_of_contents: false

import Image from '@theme/IdealImage'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Deploy this version

docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.70.1-stable
pip install litellm==1.70.1

Key Highlights

LiteLLM v1.70.1-stable is live now. Here are the key highlights of this release:

  • Gemini Realtime API: You can now call Gemini's Live API via the OpenAI /v1/realtime API
  • Spend Logs Retention Period: Enable deleting spend logs older than a certain period.
  • PII Masking 2.0: Easily configure masking or blocking specific PII/PHI entities on the UI

Gemini Realtime API

<Image img={require('../../img/gemini_realtime.png')}/>

This release brings support for calling Gemini's realtime models (e.g. gemini-2.0-flash-live) via OpenAI's /v1/realtime API. This is great for developers as it lets them easily switch from OpenAI to Gemini by just changing the model name.

Key Highlights:

  • Support for text + audio input/output
  • Support for setting session configurations (modality, instructions, activity detection) in the OpenAI format
  • Support for logging + usage tracking for realtime sessions

This is currently supported via Google AI Studio. We plan to release VertexAI support over the coming week.

Read more

Spend Logs Retention Period

<Image img={require('../../img/delete_spend_logs.jpg')}/>

This release enables deleting LiteLLM Spend Logs older than a certain period. Since we now enable storing the raw request/response in the logs, deleting old logs ensures the database remains performant in production.

Read more

PII Masking 2.0

<Image img={require('../../img/pii_masking_v2.png')}/>

This release brings improvements to our Presidio PII Integration. As a Proxy Admin, you now have the ability to:

  • Mask or block specific entities (e.g., block medical licenses while masking other entities like emails).
  • Monitor guardrails in production. LiteLLM Logs will now show you the guardrail run, the entities it detected, and its confidence score for each entity.

Read more

New Models / Updated Models

  • Gemini (VertexAI + Google AI Studio)
    • /chat/completion
      • Handle audio input - PR
      • Fixes maximum recursion depth issue when using deeply nested response schemas with Vertex AI by Increasing DEFAULT_MAX_RECURSE_DEPTH from 10 to 100 in constants. PR
      • Capture reasoning tokens in streaming mode - PR
  • Google AI Studio
    • /realtime
      • Gemini Multimodal Live API support
      • Audio input/output support, optional param mapping, accurate usage calculation - PR
  • VertexAI
    • /chat/completion
      • Fix llama streaming error - where model response was nested in returned streaming chunk - PR
  • Ollama
    • /chat/completion
      • structure responses fix - PR
  • Bedrock
    • /chat/completion
      • Handle thinking_blocks when assistant.content is None - PR
      • Fixes to only allow accepted fields for tool json schema - PR
      • Add bedrock sonnet prompt caching cost information
      • Mistral Pixtral support - PR
      • Tool caching support - PR
    • /messages
      • allow using dynamic AWS Params - PR
  • Nvidia NIM
  • Novita AI
    • New Provider added for /chat/completion routes - PR
  • Azure
  • Cohere
    • /embeddings
      • Migrate embedding to use /v2/embed - adds support for output_dimensions param - PR
  • Anthropic
  • VLLM
    • /embeddings
      • Support embedding input as list of integers
  • OpenAI

LLM API Endpoints

  • Responses API
    • Fix delete API support - PR
  • Rerank API
    • /v2/rerank now registered as ‘llm_api_route’ - enabling non-admins to call it - PR

Spend Tracking Improvements

  • /chat/completion, /messages
    • Anthropic - web search tool cost tracking - PR
    • Groq - update model max tokens + cost information - PR
  • /audio/transcription
    • Azure - Add gpt-4o-mini-tts pricing - PR
    • Proxy - Fix tracking spend by tag - PR
  • /embeddings
    • Azure AI - Add cohere embed v4 pricing - PR

Management Endpoints / UI

Logging / Alerting Integrations

Guardrails

  • Guardrails
    • New /apply_guardrail endpoint for directly testing a guardrail - PR
  • Lakera
    • /v2 endpoints support - PR
  • Presidio
    • Fixes handling of message content on presidio guardrail integration - PR
    • Allow specifying PII Entities Config - PR
  • Aim Security
    • Support for anonymization in AIM Guardrails - PR

Performance / Loadbalancing / Reliability improvements

General Proxy Improvements

  • Authentication
    • Handle Bearer $LITELLM_API_KEY in x-litellm-api-key custom header PR
  • New Enterprise pip package - litellm-enterprise - fixes issue where enterprise folder was not found when using pip package
  • Proxy CLI
    • Add models import command - PR
  • OpenWebUI
    • Configure LiteLLM to Parse User Headers from Open Web UI
  • LiteLLM Proxy w/ LiteLLM SDK
    • Option to force/always use the litellm proxy when calling via LiteLLM SDK

New Contributors

Demo Instance

Here's a Demo Instance to test changes:

Git Diff