binbin83 commited on
Commit
155093b
·
verified ·
1 Parent(s): 8eabe33

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -2
README.md CHANGED
@@ -5,10 +5,16 @@ tags:
5
  - sentence-transformers
6
  - text-classification
7
  pipeline_tag: text-classification
 
 
 
 
8
  ---
9
 
10
  # binbin83/setfit-MiniLM-dialog-themes-13-nov
11
 
 
 
12
  This is a [SetFit model](https://github.com/huggingface/setfit) that can be used for text classification. The model has been trained using an efficient few-shot learning technique that involves:
13
 
14
  1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
@@ -29,12 +35,53 @@ from setfit import SetFitModel
29
 
30
  # Download from Hub and run inference
31
  model = SetFitModel.from_pretrained("binbin83/setfit-MiniLM-dialog-themes-13-nov")
 
32
  # Run inference
33
- preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])
 
 
34
  ```
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ## BibTeX entry and citation info
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ```bibtex
39
  @article{https://doi.org/10.48550/arxiv.2209.11055,
40
  doi = {10.48550/ARXIV.2209.11055},
@@ -46,4 +93,4 @@ publisher = {arXiv},
46
  year = {2022},
47
  copyright = {Creative Commons Attribution 4.0 International}
48
  }
49
- ```
 
5
  - sentence-transformers
6
  - text-classification
7
  pipeline_tag: text-classification
8
+ language:
9
+ - fr
10
+ metrics:
11
+ - f1
12
  ---
13
 
14
  # binbin83/setfit-MiniLM-dialog-themes-13-nov
15
 
16
+ The model is a multi-class multi-label text classifier to distinguish the different dialog act in semi-structured interview. The data used fot fine-tuning were in French.
17
+
18
  This is a [SetFit model](https://github.com/huggingface/setfit) that can be used for text classification. The model has been trained using an efficient few-shot learning technique that involves:
19
 
20
  1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
 
35
 
36
  # Download from Hub and run inference
37
  model = SetFitModel.from_pretrained("binbin83/setfit-MiniLM-dialog-themes-13-nov")
38
+ label_dict = {'CauseConsequences': 0, 'PersonalExperience': 1, 'Connaissance': 2, 'Other': 3, 'Reconstitution': 4, 'Temps': 5, 'Reaction': 6, 'Nouvelle': 7, 'Media': 8, 'Lieux': 9}
39
  # Run inference
40
+ preds = model(["Vous pouvez continuer", "Pouvez-vous me dire précisément quel a été l'odre chronologique des événements ?"])
41
+ labels = [[[f for f, p in zip(labels_dict, ps) if p] for ps in [pred]] for pred in preds ]
42
+
43
  ```
44
 
45
+ ## Labels and training data
46
+ Based on interview guide, the themes evocated in the interview where :
47
+
48
+ ['CauseConsequences', 'PersonalExperience', 'Connaissance', 'Other', 'Reconstitution', 'Temps', 'Reaction', 'Nouvelle', 'Media', 'Lieux']
49
+
50
+ We label a small amount of data:
51
+ ('Other', 50), ('Reaction', 46), ('PersonalExperience', 41), ('CauseConsequences', 41), ('Media', 27), ('Lieux', 13), ('Nouvelle', 10), ('Temps', 9), ('Reconstitution', 7), ('Connaissance', 3)
52
+
53
+ and finetune a set fit model on it
54
+
55
+
56
+ ## Training and Performances
57
+
58
+ We finetune: "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
59
+ using SetFit with CosineLossSimilarity and this parapeters: epochs = 10, batch_size=32, num_iterations = 20
60
+
61
+
62
+
63
+ On our test dataset, we get this results:
64
+ {'f1': 0.639, 'f1_micro': 0.6808510638297872, 'f1_sample': 0.6666666666666666, 'accuracy': 0.6086956521739131}
65
+
66
  ## BibTeX entry and citation info
67
 
68
+
69
+ To cite the current study:
70
+ ```bibtex
71
+ @article{
72
+ doi = {conference paper},
73
+ url = {https://arxiv.org/abs/2209.11055},
74
+ author = {Quillivic Robin, Charles Payet},
75
+ keywords = {NLP, JADT},
76
+ title = {Semi-Structured Interview Analysis: A French NLP Toolbox for Social Sciences},
77
+ publisher = {JADT},
78
+ year = {2024},
79
+ copyright = {Creative Commons Attribution 4.0 International}
80
+ }
81
+ ```
82
+
83
+
84
+ To cite the setFit paper:
85
  ```bibtex
86
  @article{https://doi.org/10.48550/arxiv.2209.11055,
87
  doi = {10.48550/ARXIV.2209.11055},
 
93
  year = {2022},
94
  copyright = {Creative Commons Attribution 4.0 International}
95
  }
96
+ ```