DeDeckerThomas
commited on
Commit
·
1f32179
1
Parent(s):
3288081
Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ Keyphrase extraction is a technique in text analysis where you extract the impor
|
|
32 |
|
33 |
|
34 |
## 📓 Model Description
|
35 |
-
This model is a fine-tuned distilbert model on the
|
36 |
|
37 |
The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
|
38 |
|
@@ -80,18 +80,19 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
|
|
80 |
|
81 |
```python
|
82 |
# Load pipeline
|
83 |
-
model_name = "
|
84 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
85 |
```
|
86 |
```python
|
87 |
# Inference
|
88 |
text = """
|
89 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
90 |
-
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
91 |
-
Currently, classical machine learning methods, that use statistics and linguistics,
|
92 |
-
The fact that these methods have been widely used in the community
|
93 |
-
|
94 |
-
|
|
|
95 |
""".replace(
|
96 |
"\n", ""
|
97 |
)
|
@@ -103,10 +104,7 @@ print(keyphrases)
|
|
103 |
|
104 |
```
|
105 |
# Output
|
106 |
-
['
|
107 |
-
'classical machine learning' 'deep learning methods'
|
108 |
-
'keyphrase extraction' 'linguistics' 'recurrent neural networks'
|
109 |
-
'semantics' 'statistics' 'text analysis' 'transformers']
|
110 |
```
|
111 |
|
112 |
## 📚 Training Dataset
|
@@ -164,7 +162,7 @@ def preprocess_fuction(all_samples_per_split):
|
|
164 |
```
|
165 |
|
166 |
### Postprocessing
|
167 |
-
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive
|
168 |
```python
|
169 |
# Define post_process functions
|
170 |
def concat_tokens_by_tag(keyphrases):
|
@@ -198,7 +196,7 @@ def extract_keyphrases(example, predictions, tokenizer, index=0):
|
|
198 |
```
|
199 |
## 📝 Evaluation results
|
200 |
|
201 |
-
One of the traditional evaluation methods is the precision, recall and F1-score @
|
202 |
The model achieves the following results on the KPTimes test set:
|
203 |
|
204 |
| Dataset | P@5 | R@5 | F1@5 | P@10 | R@10 | F1@10 | P@M | R@M | F1@M |
|
@@ -208,4 +206,4 @@ The model achieves the following results on the KPTimes test set:
|
|
208 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
209 |
|
210 |
## 🚨 Issues
|
211 |
-
Please feel free to
|
|
|
32 |
|
33 |
|
34 |
## 📓 Model Description
|
35 |
+
This model is a fine-tuned distilbert model on the KPTimes dataset. More information can be found here: https://huggingface.co/distilbert-base-uncased.
|
36 |
|
37 |
The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
|
38 |
|
|
|
80 |
|
81 |
```python
|
82 |
# Load pipeline
|
83 |
+
model_name = "ml6team/keyphrase-extraction-distilbert-kptimes"
|
84 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
85 |
```
|
86 |
```python
|
87 |
# Inference
|
88 |
text = """
|
89 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
90 |
+
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
91 |
+
Currently, classical machine learning methods, that use statistics and linguistics,
|
92 |
+
are widely used for the extraction process. The fact that these methods have been widely used in the community
|
93 |
+
has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
|
94 |
+
transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
|
95 |
+
and context of a document, which is quite an improvement.
|
96 |
""".replace(
|
97 |
"\n", ""
|
98 |
)
|
|
|
104 |
|
105 |
```
|
106 |
# Output
|
107 |
+
['artificial intelligence']
|
|
|
|
|
|
|
108 |
```
|
109 |
|
110 |
## 📚 Training Dataset
|
|
|
162 |
```
|
163 |
|
164 |
### Postprocessing
|
165 |
+
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
|
166 |
```python
|
167 |
# Define post_process functions
|
168 |
def concat_tokens_by_tag(keyphrases):
|
|
|
196 |
```
|
197 |
## 📝 Evaluation results
|
198 |
|
199 |
+
One of the traditional evaluation methods is the precision, recall and F1-score @K,M where k is the number that stands for the first K predicted keyphrases and M for the average amount of predicted keyphrases.
|
200 |
The model achieves the following results on the KPTimes test set:
|
201 |
|
202 |
| Dataset | P@5 | R@5 | F1@5 | P@10 | R@10 | F1@10 | P@M | R@M | F1@M |
|
|
|
206 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
207 |
|
208 |
## 🚨 Issues
|
209 |
+
Please feel free to start discussions in the Community Tab.
|