DeDeckerThomas commited on
Commit
63ded27
·
1 Parent(s): 7c899d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -135,6 +135,11 @@ For more in detail information, you can take a look at the training notebook (li
135
  ### Preprocessing
136
  The documents in the dataset are already preprocessed into list of words with the corresponding labels. The only thing that must be done is tokenization and the realignment of the labels so that they correspond with the right subword tokens.
137
  ```python
 
 
 
 
 
138
  def preprocess_fuction(all_samples_per_split):
139
  tokenized_samples = tokenizer.batch_encode_plus(
140
  all_samples_per_split[dataset_document_column],
 
135
  ### Preprocessing
136
  The documents in the dataset are already preprocessed into list of words with the corresponding labels. The only thing that must be done is tokenization and the realignment of the labels so that they correspond with the right subword tokens.
137
  ```python
138
+ # Labels
139
+ label_list = ["B", "I", "O"]
140
+ lbl2idx = {"B": 0, "I": 1, "O": 2}
141
+ idx2label = {0: "B", 1: "I", 2: "O"}
142
+
143
  def preprocess_fuction(all_samples_per_split):
144
  tokenized_samples = tokenizer.batch_encode_plus(
145
  all_samples_per_split[dataset_document_column],