A newer version of the Gradio SDK is available:
5.23.3
title: Post-ASR LLM N-Best Transcription Correction
emoji: 🏢
colorFrom: yellow
colorTo: yellow
sdk: gradio
sdk_version: 5.21.0
app_file: app.py
pinned: false
license: mit
short_description: Generative Error Correction (GER) Task Baseline, WER
Post-ASR Text Correction WER Leaderboard
This application displays a baseline Word Error Rate (WER) leaderboard for the test data in the GenSEC-LLM/SLT-Task1-Post-ASR-Text-Correction dataset.
Dataset Sources
The leaderboard shows WER metrics for multiple speech recognition sources as columns:
- CHiME4
- CORAAL
- CommonVoice
- LRS2
- LibriSpeech (Clean and Other)
- SwitchBoard
- Tedlium-3
- OVERALL (aggregate across all sources)
Baseline Methods
The leaderboard displays three baseline approaches:
- No LM Baseline: Uses the 1-best ASR output without any correction (input1)
- N-gram Ranking: Ranks the N-best hypotheses using a simple n-gram statistics approach and chooses the best one
- Subwords Voting Correction: Uses a voting-based method to correct the transcript by combining information from all N-best hypotheses
Metrics
The leaderboard displays as rows:
- Number of Examples: Count of examples in the test set for each source
- Word Error Rate (No LM): WER between reference and 1-best ASR output
- Word Error Rate (N-gram Ranking): WER between reference and n-gram ranked best hypothesis
- Word Error Rate (Subwords Voting Correction): WER between reference and the voting-corrected N-best hypothesis
Lower WER values indicate better transcription accuracy.
Table Structure
The leaderboard is displayed as a table with:
- Rows: Different metrics (example counts and WER values for each method)
- Columns: Different data sources (CHiME4, CORAAL, CommonVoice, etc.) and OVERALL
Each cell shows the corresponding metric for that specific data source. The OVERALL column shows aggregate metrics across all sources.
Technical Details
N-gram Ranking
This method scores each hypothesis in the N-best list using:
- N-gram statistics (4-grams)
- Text length
- N-gram variety
The hypothesis with the highest score is selected.
Subwords Voting Correction
This method uses a simple voting mechanism:
- Groups hypotheses of the same length
- For each word position, chooses the most common word across all hypotheses
- Constructs a new transcript from these voted words
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference