mtyrrell commited on
Commit
bbe3bbb
·
1 Parent(s): d3d9b5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -59,7 +59,7 @@ The pre-processing operations used to produce the final training dataset were as
59
  3. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'.
60
  4. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
61
  5. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
62
- 6. Data is then augmented using sentence shuffle from the ```albumentations``` library
63
 
64
 
65
  ## Training procedure
 
59
  3. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'.
60
  4. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
61
  5. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
62
+ 6. Data is then augmented using sentence shuffle from the ```albumentations``` library (NLP methods insertion and substitution were also tried, but lowered the performance of the model and were therefore not included in the final training data)
63
 
64
 
65
  ## Training procedure