Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,28 @@ model-index:
|
|
20 |
- name: NER F Score
|
21 |
type: f_score
|
22 |
value: 0.7862429256
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
---
|
24 |
| Feature | Description |
|
25 |
| --- | --- |
|
@@ -30,7 +52,7 @@ model-index:
|
|
30 |
| **Components** | `transformer`, `ner` |
|
31 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
32 |
| **Sources** | n/a |
|
33 |
-
| **License** |
|
34 |
| **Author** | [n/a]() |
|
35 |
|
36 |
### Label Scheme
|
@@ -52,5 +74,17 @@ model-index:
|
|
52 |
| `ENTS_F` | 78.62 |
|
53 |
| `ENTS_P` | 77.58 |
|
54 |
| `ENTS_R` | 79.70 |
|
55 |
-
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
- name: NER F Score
|
21 |
type: f_score
|
22 |
value: 0.7862429256
|
23 |
+
|
24 |
+
widget:
|
25 |
+
- text: "Le 2 décembre, c'est un vendredi, on avait un concert. On se retrouve avec des amis chez moi."
|
26 |
+
example_title: "present historique"
|
27 |
+
- text: "On danse toute la nuit et la vous vous dites qu c'est la meilleure manière de vivre."
|
28 |
+
example_title: "present génrique"
|
29 |
+
- text: "Je me souviens d'avoir vu un enfant danser sur le toît du monde !"
|
30 |
+
example_title: "présent ennonciation"
|
31 |
+
|
32 |
+
license: agpl-3.0
|
33 |
+
---
|
34 |
+
|
35 |
+
## Description
|
36 |
+
|
37 |
+
This model was built to compute detect diffferent value of *present tense* in French (them). It's main purpose was to automate annotation on a specific dataset.
|
38 |
+
There is no waranty that it will work on any others dataset.
|
39 |
+
We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
|
40 |
+
Tthe present tense might have different meanings depending on the context. It can have a historical value, referring to the past, and it also makes the speech more alive.
|
41 |
+
Another meaning is generic, to express general truths like definitions or properties. Finally, it can have an enunciation value by referring to the present moment, to describe an ongoing action.
|
42 |
+
These different values of the present tense can only be differentiated by the context.
|
43 |
+
This is the reason why models based on contextual embedding (BERT like) should be relevant to differentiate them.
|
44 |
+
|
45 |
---
|
46 |
| Feature | Description |
|
47 |
| --- | --- |
|
|
|
52 |
| **Components** | `transformer`, `ner` |
|
53 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
54 |
| **Sources** | n/a |
|
55 |
+
| **License** | agpl-3.0 |
|
56 |
| **Author** | [n/a]() |
|
57 |
|
58 |
### Label Scheme
|
|
|
74 |
| `ENTS_F` | 78.62 |
|
75 |
| `ENTS_P` | 77.58 |
|
76 |
| `ENTS_R` | 79.70 |
|
77 |
+
|
78 |
+
|
79 |
+
### training
|
80 |
+
|
81 |
+
We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
|
82 |
+
The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
|
83 |
+
In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.
|
84 |
+
|
85 |
+
| label | train | test | valid |
|
86 |
+
| --- | --- |--- |--- |
|
87 |
+
| `PRESENT_ENNONCIATION`| 2069 | 673 | 438 |
|
88 |
+
| `PRESENT_GENERIQUE`| 704 | 177 | 147 |
|
89 |
+
| `PRESENT_HISTORIQUE`|1005 | 289 | 285|
|
90 |
+
|