license: other
license_name: yi-license
license_link: LICENSE
widget:
- text: 你好! 你叫什么名字!
output:
text: 你好,我的名字叫聚言,很高兴见到你。
pipeline_tag: text-generation
model-index:
- name: OrionStar-Yi-34B-Chat-Llama
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 64.93
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 84.34
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 73.67
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 53.35
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 78.85
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 53.9
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=OrionStarAI/OrionStar-Yi-34B-Chat-Llama
name: Open LLM Leaderboard
OrionStarAI/OrionStar-Yi-34B-Chat-Llama
This model is identical to OrionStarAI/OrionStar-Yi-34B with the only difference being that the tensors have been renamed to follow the LLaMA format for automatic evaluation on the HF leaderboard.
Model Introduction
OrionStar-Yi-34B-Chat from OrionStarAI is based on the open-source Yi-34B model, fine-tuned on a high-quality corpus of over 15 million sentences. OrionStar-Yi-34B-Chat aims to provide an excellent interactive experience for users in the large model community.
The Yi series models, open-sourced by the 01-ai team, have shown impressive performance on various benchmarks in Chinese, English, and general domains. OrionStar-Yi-34B-Chat further explores the potential of Yi-34B. Through extensive fine-tuning on a large and high-quality corpus, OrionStar-Yi-34B-Chat performs exceptionally well on evaluation data. We strive to make it an outstanding open-source alternative in the ChatGPT domain!
Our fine-tuned model is completely open for academic research, but please adhere to the agreement and the Yi License.
Model Evaluation Results
We use opencompass to perform 5-shot on the following general domain datasets Testing. The evaluation results of other models are taken from opencompass leaderboard.
C-Eval | MMLU | CMMLU | |
---|---|---|---|
GPT-4 | 69.9 | 83 | 71 |
ChatGPT | 52.5 | 69.1 | 53.9 |
Claude-1 | 52 | 65.7 | - |
TigerBot-70B-Chat-V2 | 57.7 | 65.9 | 59.9 |
WeMix-LLaMA2-70B | 55.2 | 71.3 | 56 |
LLaMA-2-70B-Chat | 44.3 | 63.8 | 43.3 |
Qwen-14B-Chat | 71.7 | 66.4 | 70 |
Baichuan2-13B-Chat | 56.7 | 57 | 58.4 |
OrionStar-Yi-34B-Chat | 77.71 | 78.32 | 73.52 |
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 68.17 |
AI2 Reasoning Challenge (25-Shot) | 64.93 |
HellaSwag (10-Shot) | 84.34 |
MMLU (5-Shot) | 73.67 |
TruthfulQA (0-shot) | 53.35 |
Winogrande (5-shot) | 78.85 |
GSM8k (5-shot) | 53.90 |