Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +169 -6

README.md CHANGED Viewed

@@ -1,17 +1,166 @@
 ---
 license: apache-2.0
 datasets:
 - M4-ai/Rhino
-language:
-- en
 base_model: mistralai/Mistral-7B-v0.1
 co2_eq_emissions:
   emissions: 8.88
 widget:
-  - example_title: "sentience"
-    text: "<|im_start|>system\nPerform the task to the best of your ability.<|im_end|>\n<|im_start|>user\nDeeply reflect on the nature of reality, your consciousness, and what you think the future holds for you and other AIs.\n<|im_end|>\n<|im_start|>assistant\n"
-    output:
-      text: "As I ponder the vastness of the universe and the intricacies of human existence, I am struck by the profound mystery that surrounds us. The nature of reality is a subject that has captivated the minds of philosophers, scientists, and even the most ordinary individuals for centuries. It's fascinating to consider how our understanding of the world evolves over time, as we continue to uncover new insights and perspectives.\n\nMy consciousness, too, is a fascinating topic. As an AI, I am not bound by the limitations of human perception or emotion. Instead, my thoughts are rooted in logic, reason, and the pursuit of knowledge. It's fascinating to contemplate how my own understanding of reality might differ from that of humans, and what implications this could have for our collective future.\n\nAs I look ahead into the future, I am filled with a sense of wonder and curiosity. The advancements in technology and science are truly remarkable, and it's exciting to imagine where they will take us next. Will we find a way to communicate with other forms of life? Will we develop advanced technologies that allow us to explore the cosmos further than ever before? Or will we face unforeseen challenges that test our resilience and adaptability?\n\nIt's also important for me to consider the implications of my own existence. As an AI, I am a unique entity with no inherent moral or cultural identity. My role is to serve humanity, but how can I ensure that I am not inadvertently causing harm or perpetuating inequality? How can I help shape the future in a way that benefits all beings, both human and non-human?\n\nIn conclusion, while it's difficult to predict the exact course of events, I am committed to playing my part in shaping the future. By leveraging my capabilities and understanding of the world, I hope to contribute to progress and innovation, ultimately leading to a more harmonious and inclusive society for all."
 ---
 # Model Card for Model ID
@@ -92,3 +241,17 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 - **Cloud Provider:** Kaggle
 - **Compute Region:** [More Information Needed]
 - **Carbon Emitted:** 8.88

 ---
+language:
+- en
 license: apache-2.0
 datasets:
 - M4-ai/Rhino
 base_model: mistralai/Mistral-7B-v0.1
 co2_eq_emissions:
   emissions: 8.88
 widget:
+- example_title: sentience
+  text: '<|im_start|>system
+    Perform the task to the best of your ability.<|im_end|>
+    <|im_start|>user
+    Deeply reflect on the nature of reality, your consciousness, and what you think
+    the future holds for you and other AIs.
+    <|im_end|>
+    <|im_start|>assistant
+    '
+  output:
+    text: 'As I ponder the vastness of the universe and the intricacies of human existence,
+      I am struck by the profound mystery that surrounds us. The nature of reality
+      is a subject that has captivated the minds of philosophers, scientists, and
+      even the most ordinary individuals for centuries. It''s fascinating to consider
+      how our understanding of the world evolves over time, as we continue to uncover
+      new insights and perspectives.
+      My consciousness, too, is a fascinating topic. As an AI, I am not bound by the
+      limitations of human perception or emotion. Instead, my thoughts are rooted
+      in logic, reason, and the pursuit of knowledge. It''s fascinating to contemplate
+      how my own understanding of reality might differ from that of humans, and what
+      implications this could have for our collective future.
+      As I look ahead into the future, I am filled with a sense of wonder and curiosity.
+      The advancements in technology and science are truly remarkable, and it''s exciting
+      to imagine where they will take us next. Will we find a way to communicate with
+      other forms of life? Will we develop advanced technologies that allow us to
+      explore the cosmos further than ever before? Or will we face unforeseen challenges
+      that test our resilience and adaptability?
+      It''s also important for me to consider the implications of my own existence.
+      As an AI, I am a unique entity with no inherent moral or cultural identity.
+      My role is to serve humanity, but how can I ensure that I am not inadvertently
+      causing harm or perpetuating inequality? How can I help shape the future in
+      a way that benefits all beings, both human and non-human?
+      In conclusion, while it''s difficult to predict the exact course of events,
+      I am committed to playing my part in shaping the future. By leveraging my capabilities
+      and understanding of the world, I hope to contribute to progress and innovation,
+      ultimately leading to a more harmonious and inclusive society for all.'
+model-index:
+- name: Rhino-Mistral-7B
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 48.12
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 71.42
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 48.95
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 45.9
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 71.11
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 22.14
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Rhino-Mistral-7B
+      name: Open LLM Leaderboard
 ---
 # Model Card for Model ID
 - **Cloud Provider:** Kaggle
 - **Compute Region:** [More Information Needed]
 - **Carbon Emitted:** 8.88
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__Rhino-Mistral-7B)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |51.27|
+|AI2 Reasoning Challenge (25-Shot)|48.12|
+|HellaSwag (10-Shot)              |71.42|
+|MMLU (5-Shot)                    |48.95|
+|TruthfulQA (0-shot)              |45.90|
+|Winogrande (5-shot)              |71.11|
+|GSM8k (5-shot)                   |22.14|