voiceGUARD / README.md
Mrkomiljon's picture
Inference
d331afd verified
|
raw
history blame
1.28 kB
metadata
license: mit
datasets:
  - fixie-ai/librispeech_asr
language:
  - en
base_model:
  - facebook/wav2vec2-base
pipeline_tag: voice-activity-detection

Voice Detection AI - Real vs AI Audio Classifier

Model Overview

This model is a fine-tuned Wav2Vec2-based audio classifier capable of distinguishing between real human voices and AI-generated voices. It has been trained on a dataset containing samples from various TTS models and real human audio recordings.


Model Details

  • Architecture: Wav2Vec2ForSequenceClassification
  • Fine-tuned on: Custom dataset with real and AI-generated audio
  • Classes:
    1. Real Human Voice
    2. AI-generated (e.g., Melgan, DiffWave, etc.)
  • Input Requirements:
    • Audio format: .wav, .mp3, etc.
    • Sample rate: 16kHz
    • Max duration: 10 seconds (longer audios are truncated, shorter ones are padded)

Performance

  • Validation Accuracy: 99.8%
  • Robustness: Successfully classifies across multiple AI-generation models.
  • Limitations: Struggles with certain unseen AI-generation models (e.g., ElevenLabs).

How to Use

1. Install Dependencies

Make sure you have transformers and torch installed:

pip install transformers torch torchaudio