--- license: mit datasets: - fixie-ai/librispeech_asr language: - en base_model: - facebook/wav2vec2-base pipeline_tag: voice-activity-detection --- # Voice Detection AI - Real vs AI Audio Classifier ### **Model Overview** This model is a fine-tuned Wav2Vec2-based audio classifier capable of distinguishing between **real human voices** and **AI-generated voices**. It has been trained on a dataset containing samples from various TTS models and real human audio recordings. --- ### **Model Details** - **Architecture:** Wav2Vec2ForSequenceClassification - **Fine-tuned on:** Custom dataset with real and AI-generated audio - **Classes:** 1. Real Human Voice 2. AI-generated (e.g., Melgan, DiffWave, etc.) - **Input Requirements:** - Audio format: `.wav`, `.mp3`, etc. - Sample rate: 16kHz - Max duration: 10 seconds (longer audios are truncated, shorter ones are padded) --- ### **Performance** - **Validation Accuracy:** 99.8% - **Robustness:** Successfully classifies across multiple AI-generation models. - **Limitations:** Struggles with certain unseen AI-generation models (e.g., ElevenLabs). --- ### **How to Use** #### **1. Install Dependencies** Make sure you have `transformers` and `torch` installed: ```bash pip install transformers torch torchaudio