Spaces:
Running
Fix TypeError during data collection
Browse filesLanguage information from the dataset `args` can contain a `"language"`
key referencing a string and not the expected dict. On parsing this
data, the application errors with "TypeError: string indices must be
integers" and then fails to load.
This fix checks the type of `args` and ensures that it's a dict. If not,
it uses the previously developed deafult bahavior: using the
`"language"` value from the model's metadata.
I'm happy to reach out to the one model owner with the non-standard
configuration, though it does look like it may have been generated by
π€ Trainer: https://huggingface.co/sanchit-gandhi/whisper-small-hi/edit/main/README.md.
Here's the record that causes the error in production:
```
meta: {'language': ['hi'], 'license': 'apache-2.0', 'tags': ['hf-asr-leaderboard', 'generated_from_trainer'], 'datasets': ['mozilla-foundation/common_voice_11_0'], 'metrics': ['wer'], 'model-index': [{'name': 'Whisper Small Hi - Sanchit Gandhi', 'results': [{'task': {'name': 'Automatic Speech Recognition', 'type': 'automatic-speech-recognition'}, 'dataset': {'name': 'Common Voice 11.0', 'type': 'mozilla-foundation/common_voice_11_0', 'config': 'hi', 'split': 'test', 'args': 'language hi'}, 'metrics': [{'name': 'Wer', 'type': 'wer', 'value': 32.09599593667993}]}]}]}
result["dataset"]: {'name': 'Common Voice 11.0', 'type': 'mozilla-foundation/common_voice_11_0', 'config': 'hi', 'split': 'test', 'args': 'language hi'}
```
Fixes huggingface/hf-speech-bench#10 and possibly
Fixes huggingface/hf-speech-bench#9.
According to huggingface/hf-speech-bench#8 as of two months ago, users
are reporting that the leaderboard has moved, but this repository is
still seeing staff contributions. Submitting fix for review regardless.
@@ -68,7 +68,7 @@ def parse_metrics_rows(meta):
|
|
68 |
if "dataset" not in result or "metrics" not in result:
|
69 |
continue
|
70 |
dataset = result["dataset"]["type"]
|
71 |
-
if "args" in result["dataset"] and "language" in result["dataset"]["args"]:
|
72 |
lang = result["dataset"]["args"]["language"]
|
73 |
else:
|
74 |
lang = meta["language"]
|
|
|
68 |
if "dataset" not in result or "metrics" not in result:
|
69 |
continue
|
70 |
dataset = result["dataset"]["type"]
|
71 |
+
if "args" in result["dataset"] and isinstance(result["dataset"]["args"], dict) and "language" in result["dataset"]["args"]:
|
72 |
lang = result["dataset"]["args"]["language"]
|
73 |
else:
|
74 |
lang = meta["language"]
|