Spaces:

XufengDuan
/

HumanLikeness

Running

App Files Files Community

XufengDuan commited on Aug 27, 2024

Commit

efa3bcc

1 Parent(s): ce1e7cd

update scripts

Browse files

Files changed (1) hide show

src/display/about.py +13 -10

src/display/about.py CHANGED Viewed

@@ -40,7 +40,17 @@ TITLE = """<h1 align="center" id="space-title">Humanlike Evaluation Model (HEM)
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-Welcome to the Humanlikeness Leaderboard, curated by [Xufeng Duan](https://xufengduan.github.io/). This platform rigorously evaluates the alignment between human and model responses in language processing, utilizing ten carefully designed psycholinguistic tasks to quantify a model's humanlikeness:<br><br>
 1. **Sounds:** Sound Shape Association<br>
 2. **Sounds:** Sound Gender Association<br>
 3. **Word:** Word Length and Predictivity<br>
@@ -51,16 +61,9 @@ Welcome to the Humanlikeness Leaderboard, curated by [Xufeng Duan](https://xufen
 8. **Meaning:** Semantic Illusion<br>
 9. **Discourse:** Implicit Causality<br>
 10. **Discourse:** Drawing Inferences<br><br>
-Each task is composed of multiple stimuli, designed to elicit both expected and unexpected responses. We have gathered data from 2000 human participants, generating response distributions that reflect natural human behavior across these tasks. By presenting identical stimuli to advanced language models, we generate corresponding response distributions for comparison.<br><br>
-The degree of congruence between these human and model distributions offers a precise measure of the model's humanlikeness.<br>
-"""
-# Which evaluations are you running? how can people reproduce what you have?
-LLM_BENCHMARKS_TEXT = """
-## Introduction
-This study aims to compare the similarities between human and model responses in language use by employing ten psycholinguistic tasks. Each task consists of multiple stimuli, with each stimulus having both expected and unexpected responses.
-To quantify the similarity, we collected responses from 2000 human participants, creating a binomial distribution for each stimulus within each task. The same stimuli were then presented to a language model, generating another binomial distribution for comparison.
 ## How it works

 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+Welcome to the Humanlikeness Leaderboard, curated by [Xufeng Duan](https://xufengduan.github.io/). This platform rigorously evaluates the alignment between human and model responses across five key aspects of language: sounds, words, syntax, meaning, and discourse, to quantify a model's humanlikeness.<br><br>
+Data from 2000 human participants have been collected, creating response distributions that reflect natural human behavior. Identical stimuli are then presented to advanced language models, generating response distributions for comparison.<br><br>
+The congruence between human and model responses provides a precise measure of the model's humanlikeness, offering critical insights into how closely these models mirror human cognitive processes.<br>
+"""
+# Which evaluations are you running? how can people reproduce what you have?
+LLM_BENCHMARKS_TEXT = """
+## Introduction
+This study aims to compare the similarities between human and model responses in language use by employing ten psycholinguistic tasks:
 1. **Sounds:** Sound Shape Association<br>
 2. **Sounds:** Sound Gender Association<br>
 3. **Word:** Word Length and Predictivity<br>
 8. **Meaning:** Semantic Illusion<br>
 9. **Discourse:** Implicit Causality<br>
 10. **Discourse:** Drawing Inferences<br><br>
+Each task is composed of multiple stimuli, designed to elicit both expected and unexpected responses. We have gathered data from 2000 human participants, generating response distributions that reflect natural human behavior across these tasks. By presenting identical stimuli to advanced language models, we generate corresponding response distributions for comparison.<br><br>
+The degree of congruence between these human and model distributions offers a precise measure of the model's humanlikeness.
 ## How it works