XufengDuan commited on
Commit
efa3bcc
·
1 Parent(s): ce1e7cd

update scripts

Browse files
Files changed (1) hide show
  1. src/display/about.py +13 -10
src/display/about.py CHANGED
@@ -40,7 +40,17 @@ TITLE = """<h1 align="center" id="space-title">Humanlike Evaluation Model (HEM)
40
 
41
  # What does your leaderboard evaluate?
42
  INTRODUCTION_TEXT = """
43
- Welcome to the Humanlikeness Leaderboard, curated by [Xufeng Duan](https://xufengduan.github.io/). This platform rigorously evaluates the alignment between human and model responses in language processing, utilizing ten carefully designed psycholinguistic tasks to quantify a model's humanlikeness:<br><br>
 
 
 
 
 
 
 
 
 
 
44
  1. **Sounds:** Sound Shape Association<br>
45
  2. **Sounds:** Sound Gender Association<br>
46
  3. **Word:** Word Length and Predictivity<br>
@@ -51,16 +61,9 @@ Welcome to the Humanlikeness Leaderboard, curated by [Xufeng Duan](https://xufen
51
  8. **Meaning:** Semantic Illusion<br>
52
  9. **Discourse:** Implicit Causality<br>
53
  10. **Discourse:** Drawing Inferences<br><br>
54
- Each task is composed of multiple stimuli, designed to elicit both expected and unexpected responses. We have gathered data from 2000 human participants, generating response distributions that reflect natural human behavior across these tasks. By presenting identical stimuli to advanced language models, we generate corresponding response distributions for comparison.<br><br>
55
- The degree of congruence between these human and model distributions offers a precise measure of the model's humanlikeness.<br>
56
- """
57
 
58
- # Which evaluations are you running? how can people reproduce what you have?
59
- LLM_BENCHMARKS_TEXT = """
60
- ## Introduction
61
-
62
- This study aims to compare the similarities between human and model responses in language use by employing ten psycholinguistic tasks. Each task consists of multiple stimuli, with each stimulus having both expected and unexpected responses.
63
- To quantify the similarity, we collected responses from 2000 human participants, creating a binomial distribution for each stimulus within each task. The same stimuli were then presented to a language model, generating another binomial distribution for comparison.
64
 
65
  ## How it works
66
 
 
40
 
41
  # What does your leaderboard evaluate?
42
  INTRODUCTION_TEXT = """
43
+ Welcome to the Humanlikeness Leaderboard, curated by [Xufeng Duan](https://xufengduan.github.io/). This platform rigorously evaluates the alignment between human and model responses across five key aspects of language: sounds, words, syntax, meaning, and discourse, to quantify a model's humanlikeness.<br><br>
44
+ Data from 2000 human participants have been collected, creating response distributions that reflect natural human behavior. Identical stimuli are then presented to advanced language models, generating response distributions for comparison.<br><br>
45
+ The congruence between human and model responses provides a precise measure of the model's humanlikeness, offering critical insights into how closely these models mirror human cognitive processes.<br>
46
+ """
47
+
48
+ # Which evaluations are you running? how can people reproduce what you have?
49
+ LLM_BENCHMARKS_TEXT = """
50
+ ## Introduction
51
+
52
+ This study aims to compare the similarities between human and model responses in language use by employing ten psycholinguistic tasks:
53
+
54
  1. **Sounds:** Sound Shape Association<br>
55
  2. **Sounds:** Sound Gender Association<br>
56
  3. **Word:** Word Length and Predictivity<br>
 
61
  8. **Meaning:** Semantic Illusion<br>
62
  9. **Discourse:** Implicit Causality<br>
63
  10. **Discourse:** Drawing Inferences<br><br>
 
 
 
64
 
65
+ Each task is composed of multiple stimuli, designed to elicit both expected and unexpected responses. We have gathered data from 2000 human participants, generating response distributions that reflect natural human behavior across these tasks. By presenting identical stimuli to advanced language models, we generate corresponding response distributions for comparison.<br><br>
66
+ The degree of congruence between these human and model distributions offers a precise measure of the model's humanlikeness.
 
 
 
 
67
 
68
  ## How it works
69