dnaihao commited on
Commit
6e69c2a
·
1 Parent(s): f2f99bf

Updated the dataset info

Browse files
Files changed (1) hide show
  1. utils.py +9 -10
utils.py CHANGED
@@ -55,20 +55,19 @@ TABLE_INTRODUCTION = """
55
 
56
  LEADERBOARD_INFO = """
57
  ## Dataset Summary
58
- - **Questions and Options:** Each question within the dataset typically has **ten** multiple-choice options, except for some that were reduced during the manual review process to remove unreasonable choices. This increase from the original **four** options per question is designed to enhance complexity and robustness, necessitating deeper reasoning to discern the correct answer among a larger pool of potential distractors.
59
- - **Sources:** The dataset consolidates questions from several sources:
60
- - **Original MMLU Questions:** Part of the dataset comes from the original MMLU dataset. We remove the trivial and ambiguous questions.
61
- - **STEM Website:** Hand-picking high-quality STEM problems from the Internet.
62
- - **TheoremQA:** High-quality human-annotated questions requiring theorems to solve.
63
- - **SciBench:** Science questions from college exams.
64
  """
65
 
66
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
67
  CITATION_BUTTON_TEXT = r"""
68
- @article{wang2024mmlu,
69
- title={Mmlu-pro: A more robust and challenging multi-task language understanding benchmark},
70
- author={Wang, Yubo and Ma, Xueguang and Zhang, Ge and Ni, Yuansheng and Chandra, Abhranil and Guo, Shiguang and Ren, Weiming and Arulraj, Aaran and He, Xuan and Jiang, Ziyan and others},
71
- journal={arXiv preprint arXiv:2406.01574},
72
  year={2024}
73
  }
74
  """
 
55
 
56
  LEADERBOARD_INFO = """
57
  ## Dataset Summary
58
+ - **Questions and Labels:** The task is to decide whether the provided explanation fully explains the joke (good) or does not fully explain the joke (bad).
59
+ - **Sources:**
60
+ - **Jokes:** We construct our dataset by including RZB jokes from "Best Annual Threads" between 2018 and 2021 that have been previously crawled (https://github.com/Leymore/ruozhiba). In addition, we directly collect all threads in the "Moderator's Recommendation" section from RZB.
61
+ - **Explanations:** We source the explanations from GPT-4o and ERNIE-4-turbo.
62
+ - **Annotations:** We manually annotate the generated explanations as either "fully explain the joke" (good) or "partially explain or not explain the joke" (bad). The gold label is determined by the majority vote among five native Chinese speakers.
 
63
  """
64
 
65
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
66
  CITATION_BUTTON_TEXT = r"""
67
+ @article{he2024chumor,
68
+ title={Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba},
69
+ author={He, Ruiqi and He, Yushu and Bai, Longju and Liu, Jiarui and Sun, Zhenjie and Tang, Zenghao and Wang, He and Xia, Hanchen and Deng, Naihao},
70
+ journal={arXiv preprint arXiv:2406.12754},
71
  year={2024}
72
  }
73
  """