Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ It achieves the following results on the evaluation set:
|
|
35 |
|
36 |
## Model description
|
37 |
|
38 |
-
The model is a
|
39 |
|
40 |
## Intended uses & limitations
|
41 |
|
@@ -61,8 +61,10 @@ The pre-processing operations used to produce the final training dataset were as
|
|
61 |
3. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'.
|
62 |
4. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
|
63 |
5. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
|
64 |
-
6. Data is then augmented using sentence shuffle from the ```albumentations``` library and NLP-based insertions using ```nlpaug```.
|
65 |
-
|
|
|
|
|
66 |
|
67 |
## Training procedure
|
68 |
|
|
|
35 |
|
36 |
## Model description
|
37 |
|
38 |
+
The model is a multi-class text classifier based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and fine-tuned on text sourced from national climate policy documents.
|
39 |
|
40 |
## Intended uses & limitations
|
41 |
|
|
|
61 |
3. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'.
|
62 |
4. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
|
63 |
5. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
|
64 |
+
6. Data is then augmented using sentence shuffle from the ```albumentations``` library and NLP-based insertions using ```nlpaug```. This is done to increase the number of training samples available for the Net-Zero class from 62 to 124. The end result is a almost equal sample per class breakdown of:
|
65 |
+
> - 'NET-ZERO': 124
|
66 |
+
> - 'NEGATIVE': 126
|
67 |
+
> - 'TARGET_FREE': 125
|
68 |
|
69 |
## Training procedure
|
70 |
|