README.md · sequelbox/Llama3.1-8B-MOTH at refs/pr/2

File size: 4,105 Bytes

---
language:
- en
license: other
tags:
- supernova
- moth
- llama
- llama-3.1
- llama-3.1-instruct
- llama-3.1-instruct-8b
- llama-3
- llama-3-instruct
- llama-3-instruct-8b
- 8b
- general
- conversational
- chat
- instruct
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
datasets:
- sequelbox/Supernova
pipeline_tag: text-generation
model_type: llama
model-index:
- name: Llama3.1-8B-MOTH
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 52.08
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 26.45
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 11.86
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 2.57
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 3.79
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 25.48
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sequelbox/Llama3.1-8B-MOTH
      name: Open LLM Leaderboard
---

- MOTH is a general chat AI.
- MOTH is finetuned on [high quality synthetic data.](https://huggingface.co/datasets/sequelbox/Supernova)
- MOTH is trained on a variety of skills and specialties.
- This version of MOTH is trained on the [Llama 3.1 Instruct format.](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
- MOTH is also available for [Gemma 2;](https://huggingface.co/sequelbox/gemma-2-9B-MOTH) more MOTH finetunes for other models to follow.
- MOTH has not been manually tested and uses automatically generated datasets.
- Do as you will.



(uses llama 3.1 license available at https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sequelbox__Llama3.1-8B-MOTH)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |20.37|
|IFEval (0-Shot)    |52.08|
|BBH (3-Shot)       |26.45|
|MATH Lvl 5 (4-Shot)|11.86|
|GPQA (0-shot)      | 2.57|
|MuSR (0-shot)      | 3.79|
|MMLU-PRO (5-shot)  |25.48|