DesertWolf's picture
Upload folder using huggingface_hub
447ebeb verified
metadata
title: v1.65.0-stable - Model Context Protocol
slug: v1.65.0-stable
date: 2025-03-30T10:00:00.000Z
authors:
  - name: Krrish Dholakia
    title: CEO, LiteLLM
    url: https://www.linkedin.com/in/krish-d/
    image_url: >-
      https://media.licdn.com/dms/image/v2/D4D03AQGrlsJ3aqpHmQ/profile-displayphoto-shrink_400_400/B4DZSAzgP7HYAg-/0/1737327772964?e=1749686400&v=beta&t=Hkl3U8Ps0VtvNxX0BNNq24b4dtX5wQaPFp6oiKCIHD8
  - name: Ishaan Jaffer
    title: CTO, LiteLLM
    url: https://www.linkedin.com/in/reffajnaahsi/
    image_url: >-
      https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
tags:
  - mcp
  - custom_prompt_management
hide_table_of_contents: false

import Image from '@theme/IdealImage';

v1.65.0-stable is live now. Here are the key highlights of this release:

  • MCP Support: Support for adding and using MCP servers on the LiteLLM proxy.
  • UI view total usage after 1M+ logs: You can now view usage analytics after crossing 1M+ logs in DB.

Model Context Protocol (MCP)

This release introduces support for centrally adding MCP servers on LiteLLM. This allows you to add MCP server endpoints and your developers can list and call MCP tools through LiteLLM.

Read more about MCP here.

<Image img={require('../../img/release_notes/mcp_ui.png')} style={{width: '100%', display: 'block', margin: '2rem auto'}} />

Expose and use MCP servers through LiteLLM

UI view total usage after 1M+ logs

This release brings the ability to view total usage analytics even after exceeding 1M+ logs in your database. We've implemented a scalable architecture that stores only aggregate usage data, resulting in significantly more efficient queries and reduced database CPU utilization.

<Image img={require('../../img/release_notes/ui_usage.png')} style={{width: '100%', display: 'block', margin: '2rem auto'}} />

View total usage after 1M+ logs

  • How this works:

    • We now aggregate usage data into a dedicated DailyUserSpend table, significantly reducing query load and CPU usage even beyond 1M+ logs.
  • Daily Spend Breakdown API:

    • Retrieve granular daily usage data (by model, provider, and API key) with a single endpoint. Example Request:
    curl -L -X GET 'http://localhost:4000/user/daily/activity?start_date=2025-03-20&end_date=2025-03-27' \
    -H 'Authorization: Bearer sk-...'
    
    {
        "results": [
            {
                "date": "2025-03-27",
                "metrics": {
                    "spend": 0.0177072,
                    "prompt_tokens": 111,
                    "completion_tokens": 1711,
                    "total_tokens": 1822,
                    "api_requests": 11
                },
                "breakdown": {
                    "models": {
                        "gpt-4o-mini": {
                            "spend": 1.095e-05,
                            "prompt_tokens": 37,
                            "completion_tokens": 9,
                            "total_tokens": 46,
                            "api_requests": 1
                    },
                    "providers": { "openai": { ... }, "azure_ai": { ... } },
                    "api_keys": { "3126b6eaf1...": { ... } }
                }
            }
        ],
        "metadata": {
            "total_spend": 0.7274667,
            "total_prompt_tokens": 280990,
            "total_completion_tokens": 376674,
            "total_api_requests": 14
        }
    }
    

New Models / Updated Models

  • Support for Vertex AI gemini-2.0-flash-lite & Google AI Studio gemini-2.0-flash-lite PR
  • Support for Vertex AI Fine-Tuned LLMs PR
  • Nova Canvas image generation support PR
  • OpenAI gpt-4o-transcribe support PR
  • Added new Vertex AI text embedding model PR

LLM Translation

  • OpenAI Web Search Tool Call Support PR
  • Vertex AI topLogprobs support PR
  • Support for sending images and video to Vertex AI multimodal embedding Doc
  • Support litellm.api_base for Vertex AI + Gemini across completion, embedding, image_generation PR
  • Bug fix for returning response_cost when using litellm python SDK with LiteLLM Proxy PR
  • Support for max_completion_tokens on Mistral API PR
  • Refactored Vertex AI passthrough routes - fixes unpredictable behaviour with auto-setting default_vertex_region on router model add PR

Spend Tracking Improvements

  • Log 'api_base' on spend logs PR
  • Support for Gemini audio token cost tracking PR
  • Fixed OpenAI audio input token cost tracking PR

UI

Model Management

  • Allowed team admins to add/update/delete models on UI PR
  • Added render supports_web_search on model hub PR

Request Logs

  • Show API base and model ID on request logs PR
  • Allow viewing keyinfo on request logs PR

Usage Tab

  • Added Daily User Spend Aggregate view - allows UI Usage tab to work > 1m rows PR
  • Connected UI to "LiteLLM_DailyUserSpend" spend table PR

Logging Integrations

  • Fixed StandardLoggingPayload for GCS Pub Sub Logging Integration PR
  • Track litellm_model_name on StandardLoggingPayload Docs

Performance / Reliability Improvements

  • LiteLLM Redis semantic caching implementation PR
  • Gracefully handle exceptions when DB is having an outage PR
  • Allow Pods to startup + passing /health/readiness when allow_requests_on_db_unavailable: True and DB is down PR

General Improvements

  • Support for exposing MCP tools on litellm proxy PR
  • Support discovering Gemini, Anthropic, xAI models by calling their /v1/model endpoint PR
  • Fixed route check for non-proxy admins on JWT auth PR
  • Added baseline Prisma database migrations PR
  • View all wildcard models on /model/info PR

Security

  • Bumped next from 14.2.21 to 14.2.25 in UI dashboard PR

Complete Git Diff

Here's the complete git diff