nbrahme commited on
Commit
f5b2978
·
verified ·
1 Parent(s): d4224f7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -16
README.md CHANGED
@@ -290,16 +290,7 @@ This diverse and extensive training data foundation allowed Project Indus LLM to
290
 
291
  Project Indus LLM has been evaluated using the Indic LLM Leaderboard, which employs the `indic_eval` evaluation framework specifically designed for assessing models on Indian language tasks. This framework provides a comprehensive view of model performance across a variety of benchmarks tailored to Indian languages.
292
 
293
- Detailed results from the Indic LLM Leaderboard (α), accessible at [Hugging Face Indic LLM Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard), are shown below:
294
 
295
- | Task | Version | Metric | Value | | Stderr |
296
- |--------------------------------|---------|----------|-------|---|--------|
297
- | All | | acc | 0.2891| ± | 0.0109 |
298
- | | | acc_norm | 0.3013| ± | 0.0112 |
299
- | indiceval:ARC-Challenge:hindi:10 | 0 | acc | 0.2167| ± | 0.0120 |
300
- | | | acc_norm | 0.2474| ± | 0.0126 |
301
- | indiceval:ARC-Easy:hindi:5 | 0 | acc | 0.3615| ± | 0.0099 |
302
- | | | acc_norm | 0.3552| ± | 0.0098 |
303
 
304
  These results highlight the model's capabilities in understanding and generating Hindi language text under controlled testing conditions. The standard error values indicate the variance observed during the evaluation, providing insights into the consistency of the model's performance across different evaluation runs.
305
 
@@ -307,13 +298,7 @@ These results highlight the model's capabilities in understanding and generating
307
 
308
  Additionally, Project Indus LLM has been evaluated on the Open LLM Leaderboard, which provides another layer of benchmarking by comparing the model's performance against other state-of-the-art language models. Below are the summarized results from the Open LLM Leaderboard:
309
 
310
- | Metric |Value|
311
- |---------------------------------|----:|
312
- |Avg. |20.07|
313
- |AI2 Reasoning Challenge (25-Shot)|22.70|
314
- |HellaSwag (10-Shot) |25.04|
315
- |MMLU (5-Shot) |23.12|
316
- |Winogrande (5-shot) |49.57|
317
 
318
  These benchmark results can be explored further on [Hugging Face Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
319
 
 
290
 
291
  Project Indus LLM has been evaluated using the Indic LLM Leaderboard, which employs the `indic_eval` evaluation framework specifically designed for assessing models on Indian language tasks. This framework provides a comprehensive view of model performance across a variety of benchmarks tailored to Indian languages.
292
 
 
293
 
 
 
 
 
 
 
 
 
294
 
295
  These results highlight the model's capabilities in understanding and generating Hindi language text under controlled testing conditions. The standard error values indicate the variance observed during the evaluation, providing insights into the consistency of the model's performance across different evaluation runs.
296
 
 
298
 
299
  Additionally, Project Indus LLM has been evaluated on the Open LLM Leaderboard, which provides another layer of benchmarking by comparing the model's performance against other state-of-the-art language models. Below are the summarized results from the Open LLM Leaderboard:
300
 
301
+
 
 
 
 
 
 
302
 
303
  These benchmark results can be explored further on [Hugging Face Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
304