Update README.md
Browse files
README.md
CHANGED
@@ -290,16 +290,7 @@ This diverse and extensive training data foundation allowed Project Indus LLM to
|
|
290 |
|
291 |
Project Indus LLM has been evaluated using the Indic LLM Leaderboard, which employs the `indic_eval` evaluation framework specifically designed for assessing models on Indian language tasks. This framework provides a comprehensive view of model performance across a variety of benchmarks tailored to Indian languages.
|
292 |
|
293 |
-
Detailed results from the Indic LLM Leaderboard (α), accessible at [Hugging Face Indic LLM Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard), are shown below:
|
294 |
|
295 |
-
| Task | Version | Metric | Value | | Stderr |
|
296 |
-
|--------------------------------|---------|----------|-------|---|--------|
|
297 |
-
| All | | acc | 0.2891| ± | 0.0109 |
|
298 |
-
| | | acc_norm | 0.3013| ± | 0.0112 |
|
299 |
-
| indiceval:ARC-Challenge:hindi:10 | 0 | acc | 0.2167| ± | 0.0120 |
|
300 |
-
| | | acc_norm | 0.2474| ± | 0.0126 |
|
301 |
-
| indiceval:ARC-Easy:hindi:5 | 0 | acc | 0.3615| ± | 0.0099 |
|
302 |
-
| | | acc_norm | 0.3552| ± | 0.0098 |
|
303 |
|
304 |
These results highlight the model's capabilities in understanding and generating Hindi language text under controlled testing conditions. The standard error values indicate the variance observed during the evaluation, providing insights into the consistency of the model's performance across different evaluation runs.
|
305 |
|
@@ -307,13 +298,7 @@ These results highlight the model's capabilities in understanding and generating
|
|
307 |
|
308 |
Additionally, Project Indus LLM has been evaluated on the Open LLM Leaderboard, which provides another layer of benchmarking by comparing the model's performance against other state-of-the-art language models. Below are the summarized results from the Open LLM Leaderboard:
|
309 |
|
310 |
-
|
311 |
-
|---------------------------------|----:|
|
312 |
-
|Avg. |20.07|
|
313 |
-
|AI2 Reasoning Challenge (25-Shot)|22.70|
|
314 |
-
|HellaSwag (10-Shot) |25.04|
|
315 |
-
|MMLU (5-Shot) |23.12|
|
316 |
-
|Winogrande (5-shot) |49.57|
|
317 |
|
318 |
These benchmark results can be explored further on [Hugging Face Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
319 |
|
|
|
290 |
|
291 |
Project Indus LLM has been evaluated using the Indic LLM Leaderboard, which employs the `indic_eval` evaluation framework specifically designed for assessing models on Indian language tasks. This framework provides a comprehensive view of model performance across a variety of benchmarks tailored to Indian languages.
|
292 |
|
|
|
293 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
294 |
|
295 |
These results highlight the model's capabilities in understanding and generating Hindi language text under controlled testing conditions. The standard error values indicate the variance observed during the evaluation, providing insights into the consistency of the model's performance across different evaluation runs.
|
296 |
|
|
|
298 |
|
299 |
Additionally, Project Indus LLM has been evaluated on the Open LLM Leaderboard, which provides another layer of benchmarking by comparing the model's performance against other state-of-the-art language models. Below are the summarized results from the Open LLM Leaderboard:
|
300 |
|
301 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
302 |
|
303 |
These benchmark results can be explored further on [Hugging Face Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
304 |
|