Update README.md
Browse files
README.md
CHANGED
@@ -97,7 +97,7 @@ We use the binary formulation of this task (positive vs. negative).
|
|
97 |
<summary>Method</summary>
|
98 |
|
99 |
* Evaluation setting: zero-shot and few-shot perplexity-based evaluation.
|
100 |
-
* Prompt: ```"Tekst: {text}\nSentiment:{label}"```, where the ```label``` is either "positiv" or "negativ".
|
101 |
* Few-shot results show the average scores across 5 repetitions
|
102 |
* Evaluation script: https://github.com/ltgoslo/norallm/blob/main/initial_evaluation/sentiment_analysis.py
|
103 |
* Performance metric: macro-averaged F1-score.
|
@@ -124,13 +124,13 @@ We use the binary formulation of this task (positive vs. negative).
|
|
124 |
|
125 |
### Reading comprehension
|
126 |
|
127 |
-
[NorQuAD](https://huggingface.co/datasets/ltg/norquad) ([Ivanova et al., 2023](https://aclanthology.org/2023.nodalida-1.17/)) is a dataset for extractive question answering in Norwegian designed similarly to [SQuAD (Rajpurkar et al., 2016)](https://aclanthology.org/D16-1264/).
|
128 |
|
129 |
<details>
|
130 |
<summary>Method</summary>
|
131 |
|
132 |
* Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
|
133 |
-
* Prompt: ```"Tittel: {title}\n\nTekst: {text}\n\nSpørsmål: {question}\n\nSvar:{answer}"```
|
134 |
* Few-shot results show the average scores across 5 repetitions
|
135 |
* Evaluation script: https://github.com/ltgoslo/norallm/blob/main/initial_evaluation/norquad.py
|
136 |
* Performance metrics: macro-averaged F1-score and exact match (EM).
|
|
|
97 |
<summary>Method</summary>
|
98 |
|
99 |
* Evaluation setting: zero-shot and few-shot perplexity-based evaluation.
|
100 |
+
* Prompt: ```"Tekst: {text}\nSentiment:{label}"```, where the ```label``` is either "positiv" or "negativ".
|
101 |
* Few-shot results show the average scores across 5 repetitions
|
102 |
* Evaluation script: https://github.com/ltgoslo/norallm/blob/main/initial_evaluation/sentiment_analysis.py
|
103 |
* Performance metric: macro-averaged F1-score.
|
|
|
124 |
|
125 |
### Reading comprehension
|
126 |
|
127 |
+
[NorQuAD](https://huggingface.co/datasets/ltg/norquad) ([Ivanova et al., 2023](https://aclanthology.org/2023.nodalida-1.17/)) is a dataset for extractive question answering in Norwegian designed similarly to [SQuAD (Rajpurkar et al., 2016)](https://aclanthology.org/D16-1264/).
|
128 |
|
129 |
<details>
|
130 |
<summary>Method</summary>
|
131 |
|
132 |
* Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
|
133 |
+
* Prompt: ```"Tittel: {title}\n\nTekst: {text}\n\nSpørsmål: {question}\n\nSvar:{answer}"``` Based on [Brown et al. (2020)](https://arxiv.org/abs/2005.14165).
|
134 |
* Few-shot results show the average scores across 5 repetitions
|
135 |
* Evaluation script: https://github.com/ltgoslo/norallm/blob/main/initial_evaluation/norquad.py
|
136 |
* Performance metrics: macro-averaged F1-score and exact match (EM).
|