Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
davidberenstein1957
's Collections
Smol but mighty
Useful Spaces
LLM evals and benchmark datasets
Synthetic Data Papers
Dataset Viber annotators
Cool and fun Spaces
Model Leaderboards
Useful models
Useful datasets
LLM evals and benchmark datasets
updated
10 days ago
Upvote
2
allenai/reward-bench
Viewer
•
Updated
Sep 9
•
8.11k
•
6.23k
•
79
openai/openai_humaneval
Viewer
•
Updated
Jan 4
•
164
•
84.4k
•
255
google/IFEval
Viewer
•
Updated
Aug 14
•
541
•
6.75k
•
44
allenai/ai2_arc
Viewer
•
Updated
Dec 21, 2023
•
7.79k
•
112k
•
160
allenai/winogrande
Updated
Jan 18
•
85.4k
•
58
TIGER-Lab/MMLU-Pro
Viewer
•
Updated
29 days ago
•
12.1k
•
39k
•
302
cais/mmlu
Viewer
•
Updated
Mar 8
•
231k
•
112k
•
349
truthfulqa/truthful_qa
Viewer
•
Updated
Jan 4
•
1.63k
•
27.7k
•
209
openai/gsm8k
Viewer
•
Updated
Jan 4
•
17.6k
•
169k
•
459
Rowan/hellaswag
Viewer
•
Updated
Sep 28, 2023
•
60k
•
110k
•
99
tatsu-lab/alpaca_eval
Updated
Aug 16
•
31.4k
•
51
HuggingFaceH4/mt_bench_prompts
Viewer
•
Updated
Jul 3, 2023
•
80
•
297
•
16
nvidia/ChatRAG-Bench
Viewer
•
Updated
May 24
•
34.6k
•
1.59k
•
101
rungalileo/ragbench
Viewer
•
Updated
Jun 11
•
95.4k
•
3.2k
•
34
Upvote
2
Share collection
View history
Collection guide
Browse collections