Spaces:
Bradarr
/
Running on Zero

csm-1b / README.md
Zackh's picture
readme
ef55fce

A newer version of the Gradio SDK is available: 5.21.0

Upgrade
metadata
title: Sesame CSM
emoji: 🌱
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 5.20.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Conversational speech generation

CSM 1B

2025/03/13 - We are releasing the 1B CSM variant. Code is available on GitHub: SesameAILabs/csm. Checkpoint is hosted on HuggingFace.


CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

A fine-tuned variant of CSM powers the interactive voice demo shown in our blog post.

A hosted HuggingFace space is also available for testing audio generation.

Misuse and abuse ⚠️

This project provides a high-quality speech generation model for research and educational purposes. While we encourage responsible and ethical use, we explicitly prohibit the following:

  • Impersonation or Fraud: Do not use this model to generate speech that mimics real individuals without their explicit consent.
  • Misinformation or Deception: Do not use this model to create deceptive or misleading content, such as fake news or fraudulent calls.
  • Illegal or Harmful Activities: Do not use this model for any illegal, harmful, or malicious purposes.

By using this model, you agree to comply with all applicable laws and ethical guidelines. We are not responsible for any misuse, and we strongly condemn unethical applications of this technology.

Prompts Conversational prompts are from the EdAcc dataset Read speech prompts are form the LibriTTS-R dataset

Authors Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang, and the Sesame team.