Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ The model has a context length of 8192.
|
|
32 |
We evaluated Gemma2 9B CPT SEA-LIONv3 Instruct on both general language capabilities and instruction-following capabilities.
|
33 |
|
34 |
#### General Language Capabilities
|
35 |
-
For the evaluation of general language capabilities, we employed the [SEA HELM evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
|
36 |
These tasks include Question Answering (QA), Sentiment Analysis (Sentiment), Toxicity Detection (Toxicity), Translation in both directions (Eng>Lang & Lang>Eng), Abstractive Summarization (Summ), Causal Reasoning (Causal) and Natural Language Inference (NLI).
|
37 |
|
38 |
Note: SEA HELM is implemented using prompts which expect answers in a strict format. For all tasks, the model is expected to provide an answer tag from which the answer would be extracted. For tasks where options are provided, the answer should only include one of the pre-defined options. The weighted accuracy of the answers is calculated and normalisation is performed to account for baseline performance due to random chance.
|
|
|
32 |
We evaluated Gemma2 9B CPT SEA-LIONv3 Instruct on both general language capabilities and instruction-following capabilities.
|
33 |
|
34 |
#### General Language Capabilities
|
35 |
+
For the evaluation of general language capabilities, we employed the [SEA HELM (also known as BHASA) evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
|
36 |
These tasks include Question Answering (QA), Sentiment Analysis (Sentiment), Toxicity Detection (Toxicity), Translation in both directions (Eng>Lang & Lang>Eng), Abstractive Summarization (Summ), Causal Reasoning (Causal) and Natural Language Inference (NLI).
|
37 |
|
38 |
Note: SEA HELM is implemented using prompts which expect answers in a strict format. For all tasks, the model is expected to provide an answer tag from which the answer would be extracted. For tasks where options are provided, the answer should only include one of the pre-defined options. The weighted accuracy of the answers is calculated and normalisation is performed to account for baseline performance due to random chance.
|