binbin83
/

fr_present_tense_value

Token Classification

Model card Files Files and versions Community

binbin83 commited on Oct 5, 2023

Commit

69a68fc

·

1 Parent(s): 8c32445

Update README.md

Files changed (1) hide show

README.md +37 -3

README.md CHANGED Viewed

@@ -20,6 +20,28 @@ model-index:
     - name: NER F Score
       type: f_score
       value: 0.7862429256
 ---
 | Feature | Description |
 | --- | --- |
@@ -30,7 +52,7 @@ model-index:
 | **Components** | `transformer`, `ner` |
 | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
 | **Sources** | n/a |
-| **License** | n/a |
 | **Author** | [n/a]() |
 ### Label Scheme
@@ -52,5 +74,17 @@ model-index:
 | `ENTS_F` | 78.62 |
 | `ENTS_P` | 77.58 |
 | `ENTS_R` | 79.70 |
-| `TRANSFORMER_LOSS` | 82001.90 |
-| `NER_LOSS` | 52384.87 |

     - name: NER F Score
       type: f_score
       value: 0.7862429256
+widget:
+- text: "Le 2 décembre, c'est un vendredi, on avait un concert. On se retrouve avec des amis chez moi."
+  example_title: "present historique"
+- text: "On danse toute la nuit et la vous vous dites qu c'est la meilleure manière de vivre."
+  example_title: "present génrique"
+- text: "Je me souviens d'avoir vu un enfant danser sur le toît du monde !"
+  example_title: "présent ennonciation"
+license: agpl-3.0
+---
+## Description
+This model was built to compute detect diffferent value of *present tense* in French (them). It's main purpose was to automate annotation on a specific dataset.
+There is no waranty that it  will work on any others dataset.
+We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
+Tthe present tense might have different meanings depending on the context. It can have a historical value, referring to the past, and it also makes the speech more alive.
+Another meaning is generic, to express general truths like definitions or properties. Finally, it can have an enunciation value by referring to the present moment, to describe an ongoing action.
+These different values of the present tense can only be differentiated by the context.
+This is the reason why models based on contextual embedding (BERT like) should be relevant to differentiate them.
 ---
 | Feature | Description |
 | --- | --- |
 | **Components** | `transformer`, `ner` |
 | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
 | **Sources** | n/a |
+| **License** |  agpl-3.0 |
 | **Author** | [n/a]() |
 ### Label Scheme
 | `ENTS_F` | 78.62 |
 | `ENTS_P` | 77.58 |
 | `ENTS_R` | 79.70 |
+### training
+We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
+The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
+In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.
+| label | train | test | valid |
+| --- | --- |--- |--- |
+| `PRESENT_ENNONCIATION`| 2069 | 673 | 438 |
+| `PRESENT_GENERIQUE`| 704 | 177 | 147 |
+| `PRESENT_HISTORIQUE`|1005 | 289 | 285|