ml6team
/

keyphrase-extraction-kbir-inspec

Token Classification

keyphrase-extraction

Inference Endpoints

Model card Files Files and versions Community

DeDeckerThomas commited on May 3, 2022

Commit

63ded27

·

1 Parent(s): 7c899d9

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -135,6 +135,11 @@ For more in detail information, you can take a look at the training notebook (li
 ### Preprocessing
 The documents in the dataset are already preprocessed into list of words with the corresponding labels. The only thing that must be done is tokenization and the realignment of the labels so that they correspond with the right subword tokens.
 ```python
 def preprocess_fuction(all_samples_per_split):
     tokenized_samples = tokenizer.batch_encode_plus(
         all_samples_per_split[dataset_document_column],

 ### Preprocessing
 The documents in the dataset are already preprocessed into list of words with the corresponding labels. The only thing that must be done is tokenization and the realignment of the labels so that they correspond with the right subword tokens.
 ```python
+# Labels
+label_list = ["B", "I", "O"]
+lbl2idx = {"B": 0, "I": 1, "O": 2}
+idx2label = {0: "B", 1: "I", 2: "O"}
 def preprocess_fuction(all_samples_per_split):
     tokenized_samples = tokenizer.batch_encode_plus(
         all_samples_per_split[dataset_document_column],