language:
- en
license: mit
model-index:
- name: lil-c3po
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 65.02
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=deepnight-research/lil-c3po
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 84.45
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=deepnight-research/lil-c3po
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 62.36
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=deepnight-research/lil-c3po
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 68.73
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=deepnight-research/lil-c3po
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 79.16
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=deepnight-research/lil-c3po
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 48.45
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=deepnight-research/lil-c3po
name: Open LLM Leaderboard
deepnight-research/lil-c3po
Model Details:
lil-c3po is an open-source large language model (LLM) resulting from the linear merge of two distinct fine-tuned Mistral-7B models, internally referred to as c3-1 and c3-2. These models, developed in-house, bring together unique characteristics to enhance performance and utility.
Model Architecture:
lil-c3po inherits its architecture from the combined c3-1 and c3-2 models, incorporating features such as Grouped-Query Attention, Sliding-Window Attention, and Byte-fallback BPE tokenizer. This fusion aims to capitalize on the strengths of both models for improved language understanding and generation.
Training Details:
- The first model, internally referred to as c3-1, is a 7B parameter Large Language Model fine-tuned on the Intel Gaudi 2 processor. It utilizes the Direct Performance Optimization (DPO) method and is designed to excel in various language-related tasks.
- The second model, denoted as c3-2, is an instruct fine-tuned version of Mistral-7B. Its architecture features improvements in instruct fine-tuning, contributing to enhanced language understanding in instructional contexts.
License:
lil-c3po is released under the MIT license, fostering open-source collaboration and innovation.
Intended Use:
This merged model is suitable for a broad range of language-related tasks, inheriting the capabilities of the fine-tuned c3-1 and c3-2 models. Users interested in language tasks can leverage lil-c3po's capabilities.
Out-of-Scope Uses:
While lil-c3po is versatile, it is important to note that, in most cases, fine-tuning may be necessary for specific tasks. Additionally, the model should not be used to intentionally create hostile or alienating environments for people.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 68.03 |
AI2 Reasoning Challenge (25-Shot) | 65.02 |
HellaSwag (10-Shot) | 84.45 |
MMLU (5-Shot) | 62.36 |
TruthfulQA (0-shot) | 68.73 |
Winogrande (5-shot) | 79.16 |
GSM8k (5-shot) | 48.45 |