Update README.md
Browse files
README.md
CHANGED
@@ -109,9 +109,8 @@ Performance-wise:
|
|
109 |
|
110 |
\* Taiwan-LLM models responds to multi-turn questions (English) in Traditional Chinese.
|
111 |
|
112 |
-
**Details of MT-Bench-tw (0 shot):**
|
113 |
|
114 |
-
| Models
|
115 |
|-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
|
116 |
| gpt-3.5-turbo | 7.8 | 6.1 | 5.1 | 6.4 | 6.2 | 8.7 | 7.4 | 9.3 | 7.1 |
|
117 |
| Yi-34B-Chat | 9.0 | 4.8 | 5.7 | 4.0 | 4.7 | 8.5 | 8.7 | 9.8 | 6.9 |
|
@@ -123,9 +122,8 @@ Performance-wise:
|
|
123 |
| Taiwan-LLM-13B-v2.0-chat | 6.1 | 3.4 | 4.1 | 2.3 | 3.1 | 7.4 | 6.6 | 6.8 | 5.0 |
|
124 |
| Taiwan-LLM-7B-v2.1-chat | 5.2 | 2.6 | 2.3 | 1.2 | 3.4 | 6.6 | 5.7 | 6.8 | 4.2 |
|
125 |
|
126 |
-
**Details of TMMLU+ (0 shot):**
|
127 |
|
128 |
-
| Model
|
129 |
|-----------------------------------------------------|--------------|----------------|------------|------------|---------|
|
130 |
| Yi-34B-Chat | 47.65 | 64.25 | 52.73 | 54.91 | 54.87 |
|
131 |
| Qwen-14B-Chat | 43.83 | 55.00 | 48.55 | 46.22 | 48.41 |
|
|
|
109 |
|
110 |
\* Taiwan-LLM models responds to multi-turn questions (English) in Traditional Chinese.
|
111 |
|
|
|
112 |
|
113 |
+
| Details of MT-Bench-tw (0 shot):<br/>Models | STEM |Extraction|Reasoning| Math | Coding | Roleplay| Writing |Humanities|↑ AVG |
|
114 |
|-----------------------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
|
115 |
| gpt-3.5-turbo | 7.8 | 6.1 | 5.1 | 6.4 | 6.2 | 8.7 | 7.4 | 9.3 | 7.1 |
|
116 |
| Yi-34B-Chat | 9.0 | 4.8 | 5.7 | 4.0 | 4.7 | 8.5 | 8.7 | 9.8 | 6.9 |
|
|
|
122 |
| Taiwan-LLM-13B-v2.0-chat | 6.1 | 3.4 | 4.1 | 2.3 | 3.1 | 7.4 | 6.6 | 6.8 | 5.0 |
|
123 |
| Taiwan-LLM-7B-v2.1-chat | 5.2 | 2.6 | 2.3 | 1.2 | 3.4 | 6.6 | 5.7 | 6.8 | 4.2 |
|
124 |
|
|
|
125 |
|
126 |
+
| Details of TMMLU+ (0 shot):<br/>Model | STEM | Social Science | Humanities | Other | ↑ AVG |
|
127 |
|-----------------------------------------------------|--------------|----------------|------------|------------|---------|
|
128 |
| Yi-34B-Chat | 47.65 | 64.25 | 52.73 | 54.91 | 54.87 |
|
129 |
| Qwen-14B-Chat | 43.83 | 55.00 | 48.55 | 46.22 | 48.41 |
|