Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gregor Betz
commited on
description
Browse files- src/display/about.py +11 -23
src/display/about.py
CHANGED
@@ -53,29 +53,17 @@ Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingfac
|
|
53 |
|
54 |
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
|
55 |
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
<td>Measures `task` performance.</td>
|
68 |
-
<td>Measures ability to reason (about `task`).</td>
|
69 |
-
</tr>
|
70 |
-
<tr>
|
71 |
-
<td>Metric: absolute accuracy.</td>
|
72 |
-
<td>Metric: relative accuracy gain.</td>
|
73 |
-
</tr>
|
74 |
-
<tr>
|
75 |
-
<td>Covers broad spectrum of `tasks`.</td>
|
76 |
-
<td>Focuses on critical thinking `tasks`.</td>
|
77 |
-
</tr>
|
78 |
-
</table>
|
79 |
|
80 |
|
81 |
|
|
|
53 |
|
54 |
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
|
55 |
|
56 |
+
### 🤗 Open LLM Leaderboard
|
57 |
+
* Can `model` solve `task`?
|
58 |
+
* Measures `task` performance.
|
59 |
+
* Metric: absolute accuracy.
|
60 |
+
* Covers broad spectrum of `tasks`.
|
61 |
+
|
62 |
+
### `/\/` Open CoT Leaderboard
|
63 |
+
* Can `model` do CoT to improve in `task`?
|
64 |
+
* Measures ability to reason (about `task`).
|
65 |
+
* Metric: relative accuracy gain.
|
66 |
+
* Focuses on critical thinking `tasks`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
|
69 |
|