Aud2Stm2Mdi / README.md
eyov's picture
Update README.md
5aeaf4e verified

A newer version of the Gradio SDK is available: 5.9.1

Upgrade
metadata
title: Audio to Stems to MIDI Converter
emoji: 🎡
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false

Audio Processing Pipeline: Stem Separation and MIDI Conversion

Project Overview

A production-ready web application that separates audio stems and converts them to MIDI using state-of-the-art deep learning models. Built with Gradio and deployed on LightningAI, this pipeline provides an intuitive interface for audio processing tasks.

Technical Requirements

Dependencies

pip install gradio>=4.0.0
pip install demucs>=4.0.0
pip install basic-pitch>=0.4.0
pip install torch>=2.0.0 torchaudio>=2.0.0
pip install soundfile>=0.12.1
pip install numpy>=1.26.4
pip install pretty_midi>=0.2.10

File Structure

project/
β”œβ”€β”€ app.py                 # Main Gradio interface and processing logic
β”œβ”€β”€ demucs_handler.py      # Audio stem separation handler
β”œβ”€β”€ basic_pitch_handler.py # MIDI conversion handler
β”œβ”€β”€ validators.py          # Audio file validation utilities
└── requirements.txt

Implementation Details

demucs_handler.py

Handles audio stem separation using the Demucs model:

  • Supports mono and stereo input
  • Automatic stereo conversion for mono inputs
  • Efficient tensor processing with PyTorch
  • Proper error handling and logging
  • Progress tracking during processing

basic_pitch_handler.py

Manages MIDI conversion using Spotify's Basic Pitch:

  • Optimized parameters for music transcription
  • Support for polyphonic audio
  • Pitch bend detection
  • Configurable note duration and frequency ranges
  • Robust MIDI file generation

validators.py

Provides comprehensive audio file validation:

  • Format verification (WAV, MP3, FLAC)
  • File size limits (30MB default)
  • Sample rate validation (8kHz-48kHz)
  • Audio integrity checking
  • Detailed error reporting

app.py

Main application interface featuring:

  • Clean, intuitive Gradio UI
  • Multi-file upload support
  • Stem type selection (vocals, drums, bass, other)
  • Optional MIDI conversion
  • Persistent file handling
  • Progress tracking
  • Comprehensive error handling

Key Features

Audio Processing

  • High-quality stem separation using Demucs
  • Support for multiple audio formats
  • Automatic audio format conversion
  • Efficient memory management
  • Progress tracking during processing

MIDI Conversion

  • Accurate note detection
  • Polyphonic transcription
  • Configurable parameters:
    • Note duration threshold
    • Frequency range
    • Onset detection sensitivity
    • Frame-level pitch activation

User Interface

  • Simple, intuitive design
  • Real-time processing feedback
  • Preview capabilities
  • File download options

Deployment

Local Development

# Clone repository
git clone https://github.com/eyov7/Aud2Stm2Mdi.git

# Install dependencies
pip install -r requirements.txt

# Run application
python app.py

Lightning.ai Deployment

  1. Create new Lightning App
  2. Upload project files
  3. Configure compute instance (CPU or GPU)
  4. Deploy

Error Handling

Implemented comprehensive error handling for:

  • Invalid file formats
  • File size limits
  • Processing failures
  • Memory constraints
  • File system operations
  • Model inference errors

Production Features

  • Robust file validation
  • Persistent storage management
  • Proper error logging
  • Progress tracking
  • Clean user interface
  • Download capabilities
  • Multi-format support

Limitations

  • Maximum file size: 30MB
  • Supported formats: WAV, MP3, FLAC
  • Single file processing (no batch)
  • CPU-only processing by default

Notes

  • Ensure proper audio codec support
  • Monitor system resources
  • Regular temporary file cleanup
  • Consider implementing rate limiting
  • Add user session management

Closing Note

This implementation is currently running successfully on Lightning.ai, providing reliable audio stem separation and MIDI conversion capabilities through an intuitive web interface.