Spaces:
Sleeping
Sleeping
Commit
·
efa3bcc
1
Parent(s):
ce1e7cd
update scripts
Browse files- src/display/about.py +13 -10
src/display/about.py
CHANGED
@@ -40,7 +40,17 @@ TITLE = """<h1 align="center" id="space-title">Humanlike Evaluation Model (HEM)
|
|
40 |
|
41 |
# What does your leaderboard evaluate?
|
42 |
INTRODUCTION_TEXT = """
|
43 |
-
Welcome to the Humanlikeness Leaderboard, curated by [Xufeng Duan](https://xufengduan.github.io/). This platform rigorously evaluates the alignment between human and model responses
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
1. **Sounds:** Sound Shape Association<br>
|
45 |
2. **Sounds:** Sound Gender Association<br>
|
46 |
3. **Word:** Word Length and Predictivity<br>
|
@@ -51,16 +61,9 @@ Welcome to the Humanlikeness Leaderboard, curated by [Xufeng Duan](https://xufen
|
|
51 |
8. **Meaning:** Semantic Illusion<br>
|
52 |
9. **Discourse:** Implicit Causality<br>
|
53 |
10. **Discourse:** Drawing Inferences<br><br>
|
54 |
-
Each task is composed of multiple stimuli, designed to elicit both expected and unexpected responses. We have gathered data from 2000 human participants, generating response distributions that reflect natural human behavior across these tasks. By presenting identical stimuli to advanced language models, we generate corresponding response distributions for comparison.<br><br>
|
55 |
-
The degree of congruence between these human and model distributions offers a precise measure of the model's humanlikeness.<br>
|
56 |
-
"""
|
57 |
|
58 |
-
|
59 |
-
|
60 |
-
## Introduction
|
61 |
-
|
62 |
-
This study aims to compare the similarities between human and model responses in language use by employing ten psycholinguistic tasks. Each task consists of multiple stimuli, with each stimulus having both expected and unexpected responses.
|
63 |
-
To quantify the similarity, we collected responses from 2000 human participants, creating a binomial distribution for each stimulus within each task. The same stimuli were then presented to a language model, generating another binomial distribution for comparison.
|
64 |
|
65 |
## How it works
|
66 |
|
|
|
40 |
|
41 |
# What does your leaderboard evaluate?
|
42 |
INTRODUCTION_TEXT = """
|
43 |
+
Welcome to the Humanlikeness Leaderboard, curated by [Xufeng Duan](https://xufengduan.github.io/). This platform rigorously evaluates the alignment between human and model responses across five key aspects of language: sounds, words, syntax, meaning, and discourse, to quantify a model's humanlikeness.<br><br>
|
44 |
+
Data from 2000 human participants have been collected, creating response distributions that reflect natural human behavior. Identical stimuli are then presented to advanced language models, generating response distributions for comparison.<br><br>
|
45 |
+
The congruence between human and model responses provides a precise measure of the model's humanlikeness, offering critical insights into how closely these models mirror human cognitive processes.<br>
|
46 |
+
"""
|
47 |
+
|
48 |
+
# Which evaluations are you running? how can people reproduce what you have?
|
49 |
+
LLM_BENCHMARKS_TEXT = """
|
50 |
+
## Introduction
|
51 |
+
|
52 |
+
This study aims to compare the similarities between human and model responses in language use by employing ten psycholinguistic tasks:
|
53 |
+
|
54 |
1. **Sounds:** Sound Shape Association<br>
|
55 |
2. **Sounds:** Sound Gender Association<br>
|
56 |
3. **Word:** Word Length and Predictivity<br>
|
|
|
61 |
8. **Meaning:** Semantic Illusion<br>
|
62 |
9. **Discourse:** Implicit Causality<br>
|
63 |
10. **Discourse:** Drawing Inferences<br><br>
|
|
|
|
|
|
|
64 |
|
65 |
+
Each task is composed of multiple stimuli, designed to elicit both expected and unexpected responses. We have gathered data from 2000 human participants, generating response distributions that reflect natural human behavior across these tasks. By presenting identical stimuli to advanced language models, we generate corresponding response distributions for comparison.<br><br>
|
66 |
+
The degree of congruence between these human and model distributions offers a precise measure of the model's humanlikeness.
|
|
|
|
|
|
|
|
|
67 |
|
68 |
## How it works
|
69 |
|