DeDeckerThomas commited on
Commit
1f32179
·
1 Parent(s): 3288081

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -14
README.md CHANGED
@@ -32,7 +32,7 @@ Keyphrase extraction is a technique in text analysis where you extract the impor
32
 
33
 
34
  ## 📓 Model Description
35
- This model is a fine-tuned distilbert model on the kptimes dataset. More information can be found here: https://huggingface.co/distilbert-base-uncased.
36
 
37
  The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
38
 
@@ -80,18 +80,19 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
80
 
81
  ```python
82
  # Load pipeline
83
- model_name = "DeDeckerThomas/keyphrase-extraction-distilbert-kptimes"
84
  extractor = KeyphraseExtractionPipeline(model=model_name)
85
  ```
86
  ```python
87
  # Inference
88
  text = """
89
  Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
90
- Since this is a time-consuming process, Artificial Intelligence is used to automate it.
91
- Currently, classical machine learning methods, that use statistics and linguistics, are widely used for the extraction process.
92
- The fact that these methods have been widely used in the community has the advantage that there are many easy-to-use libraries.
93
- Now with the recent innovations in deep learning methods (such as recurrent neural networks and transformers, GANS, …),
94
- keyphrase extraction can be improved. These new methods also focus on the semantics and context of a document, which is quite an improvement.
 
95
  """.replace(
96
  "\n", ""
97
  )
@@ -103,10 +104,7 @@ print(keyphrases)
103
 
104
  ```
105
  # Output
106
- ['Artificial Intelligence' 'GANS' 'Keyphrase extraction'
107
- 'classical machine learning' 'deep learning methods'
108
- 'keyphrase extraction' 'linguistics' 'recurrent neural networks'
109
- 'semantics' 'statistics' 'text analysis' 'transformers']
110
  ```
111
 
112
  ## 📚 Training Dataset
@@ -164,7 +162,7 @@ def preprocess_fuction(all_samples_per_split):
164
  ```
165
 
166
  ### Postprocessing
167
- For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive B and Is. As last you strip the keyphrase to ensure all spaces are removed.
168
  ```python
169
  # Define post_process functions
170
  def concat_tokens_by_tag(keyphrases):
@@ -198,7 +196,7 @@ def extract_keyphrases(example, predictions, tokenizer, index=0):
198
  ```
199
  ## 📝 Evaluation results
200
 
201
- One of the traditional evaluation methods is the precision, recall and F1-score @k,m where k is the number that stands for the first k predicted keyphrases and m for the average amount of predicted keyphrases.
202
  The model achieves the following results on the KPTimes test set:
203
 
204
  | Dataset | P@5 | R@5 | F1@5 | P@10 | R@10 | F1@10 | P@M | R@M | F1@M |
@@ -208,4 +206,4 @@ The model achieves the following results on the KPTimes test set:
208
  For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
209
 
210
  ## 🚨 Issues
211
- Please feel free to contact Thomas De Decker for any problems with this model.
 
32
 
33
 
34
  ## 📓 Model Description
35
+ This model is a fine-tuned distilbert model on the KPTimes dataset. More information can be found here: https://huggingface.co/distilbert-base-uncased.
36
 
37
  The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
38
 
 
80
 
81
  ```python
82
  # Load pipeline
83
+ model_name = "ml6team/keyphrase-extraction-distilbert-kptimes"
84
  extractor = KeyphraseExtractionPipeline(model=model_name)
85
  ```
86
  ```python
87
  # Inference
88
  text = """
89
  Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
90
+ Since this is a time-consuming process, Artificial Intelligence is used to automate it.
91
+ Currently, classical machine learning methods, that use statistics and linguistics,
92
+ are widely used for the extraction process. The fact that these methods have been widely used in the community
93
+ has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
94
+ transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
95
+ and context of a document, which is quite an improvement.
96
  """.replace(
97
  "\n", ""
98
  )
 
104
 
105
  ```
106
  # Output
107
+ ['artificial intelligence']
 
 
 
108
  ```
109
 
110
  ## 📚 Training Dataset
 
162
  ```
163
 
164
  ### Postprocessing
165
+ For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
166
  ```python
167
  # Define post_process functions
168
  def concat_tokens_by_tag(keyphrases):
 
196
  ```
197
  ## 📝 Evaluation results
198
 
199
+ One of the traditional evaluation methods is the precision, recall and F1-score @K,M where k is the number that stands for the first K predicted keyphrases and M for the average amount of predicted keyphrases.
200
  The model achieves the following results on the KPTimes test set:
201
 
202
  | Dataset | P@5 | R@5 | F1@5 | P@10 | R@10 | F1@10 | P@M | R@M | F1@M |
 
206
  For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
207
 
208
  ## 🚨 Issues
209
+ Please feel free to start discussions in the Community Tab.