Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
7
4
4
Shashwat Goel
shash42
Follow
chiffonng's profile picture
1 follower
·
1 following
https://www.shash42.github.io
ShashwatGoel7
shash42
shashwatgoel42
AI & ML interests
Science of Deep Learning, Safe AI
Recent Activity
upvoted
a
collection
21 days ago
answer-matching
commented
on
a paper
23 days ago
Answer Matching Outperforms Multiple Choice for Language Model Evaluation
upvoted
a
paper
about 2 months ago
Pitfalls in Evaluating Language Model Forecasters
View all activity
Organizations
Papers
5
arxiv:
2502.19414
arxiv:
2502.04313
arxiv:
2403.03218
arxiv:
2402.14015
Expand 5 papers
models
0
None public yet
datasets
3
Sort: Recently updated
shash42/GPQA-Diamond-Verify
Viewer
•
Updated
May 9
•
792
•
25
shash42/MATH-Verify
Viewer
•
Updated
May 9
•
19.7k
•
9
shash42/MMLU-Pro-Verify
Viewer
•
Updated
May 9
•
114k
•
5