nbrahme
/

IndusQ

Safetensors

gpt2

Eval Results

Model card Files Files and versions Community

nbrahme commited on Oct 12, 2024

Commit

f5b2978

verified ·

1 Parent(s): d4224f7

Update README.md

Browse files

Files changed (1) hide show

README.md +1 -16

README.md CHANGED Viewed

@@ -290,16 +290,7 @@ This diverse and extensive training data foundation allowed Project Indus LLM to
 Project Indus LLM has been evaluated using the Indic LLM Leaderboard, which employs the `indic_eval` evaluation framework specifically designed for assessing models on Indian language tasks. This framework provides a comprehensive view of model performance across a variety of benchmarks tailored to Indian languages.
-Detailed results from the Indic LLM Leaderboard (α), accessible at [Hugging Face Indic LLM Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard), are shown below:
-|              Task              | Version |  Metric  | Value |   | Stderr |
-|--------------------------------|---------|----------|-------|---|--------|
-| All                            |         | acc      | 0.2891| ± | 0.0109 |
-|                                |         | acc_norm | 0.3013| ± | 0.0112 |
-| indiceval:ARC-Challenge:hindi:10 |    0    | acc      | 0.2167| ± | 0.0120 |
-|                                |         | acc_norm | 0.2474| ± | 0.0126 |
-| indiceval:ARC-Easy:hindi:5     |    0    | acc      | 0.3615| ± | 0.0099 |
-|                                |         | acc_norm | 0.3552| ± | 0.0098 |
 These results highlight the model's capabilities in understanding and generating Hindi language text under controlled testing conditions. The standard error values indicate the variance observed during the evaluation, providing insights into the consistency of the model's performance across different evaluation runs.
@@ -307,13 +298,7 @@ These results highlight the model's capabilities in understanding and generating
 Additionally, Project Indus LLM has been evaluated on the Open LLM Leaderboard, which provides another layer of benchmarking by comparing the model's performance against other state-of-the-art language models. Below are the summarized results from the Open LLM Leaderboard:
-|             Metric              |Value|
-|---------------------------------|----:|
-|Avg.                             |20.07|
-|AI2 Reasoning Challenge (25-Shot)|22.70|
-|HellaSwag (10-Shot)              |25.04|
-|MMLU (5-Shot)                    |23.12|
-|Winogrande (5-shot)              |49.57|
 These benchmark results can be explored further on [Hugging Face Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

 Project Indus LLM has been evaluated using the Indic LLM Leaderboard, which employs the `indic_eval` evaluation framework specifically designed for assessing models on Indian language tasks. This framework provides a comprehensive view of model performance across a variety of benchmarks tailored to Indian languages.
 These results highlight the model's capabilities in understanding and generating Hindi language text under controlled testing conditions. The standard error values indicate the variance observed during the evaluation, providing insights into the consistency of the model's performance across different evaluation runs.
 Additionally, Project Indus LLM has been evaluated on the Open LLM Leaderboard, which provides another layer of benchmarking by comparing the model's performance against other state-of-the-art language models. Below are the summarized results from the Open LLM Leaderboard:
 These benchmark results can be explored further on [Hugging Face Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).