PeppoCola commited on
Commit
6398a5a
1 Parent(s): 3fd672e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -65
README.md CHANGED
@@ -2,87 +2,70 @@
2
  pipeline_tag: sentence-similarity
3
  tags:
4
  - sentence-transformers
5
- - feature-extraction
6
- - sentence-similarity
7
-
 
8
  ---
9
 
10
- # {MODEL_NAME}
11
 
12
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
13
 
14
  <!--- Describe your model here -->
 
15
 
16
- ## Usage (Sentence-Transformers)
17
-
18
- Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
19
-
20
- ```
21
- pip install -U sentence-transformers
22
- ```
23
 
24
- Then you can use the model like this:
25
 
26
  ```python
27
- from sentence_transformers import SentenceTransformer
28
- sentences = ["This is an example sentence", "Each sentence is converted"]
29
-
30
- model = SentenceTransformer('{MODEL_NAME}')
31
- embeddings = model.encode(sentences)
32
- print(embeddings)
33
- ```
34
-
35
-
36
-
37
- ## Evaluation Results
38
-
39
- <!--- Describe how your model was evaluated -->
40
-
41
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
42
-
43
-
44
- ## Training
45
- The model was trained with the parameters:
46
-
47
- **DataLoader**:
48
 
49
- `torch.utils.data.dataloader.DataLoader` of length 460 with parameters:
50
- ```
51
- {'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
52
  ```
53
 
54
- **Loss**:
 
55
 
56
- `sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
57
 
58
- Parameters of the fit()-Method:
59
  ```
60
- {
61
- "epochs": 1,
62
- "evaluation_steps": 0,
63
- "evaluator": "NoneType",
64
- "max_grad_norm": 1,
65
- "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
66
- "optimizer_params": {
67
- "lr": 2e-05
68
- },
69
- "scheduler": "WarmupLinear",
70
- "steps_per_epoch": 460,
71
- "warmup_steps": 46,
72
- "weight_decay": 0.01
73
  }
74
  ```
75
 
76
-
77
- ## Full Model Architecture
78
- ```
79
- SentenceTransformer(
80
- (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
81
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
82
- (2): Normalize()
83
- )
84
  ```
85
-
86
- ## Citing & Authors
87
-
88
- <!--- Describe where people can find more information -->
 
 
 
 
 
 
 
 
 
 
 
 
2
  pipeline_tag: sentence-similarity
3
  tags:
4
  - sentence-transformers
5
+ - Text Classification
6
+ license: gpl-3.0
7
+ language:
8
+ - en
9
  ---
10
 
11
+ # FewShotIssueClassifier-NLBSE23
12
 
13
+ This is a SetFit model using Sentence Transformers to map sentences & paragraphs to a 768 dimensional dense vector space. It be used for tasks like clustering or semantic search.
14
 
15
  <!--- Describe your model here -->
16
+ This specific model is fine-tuned for Issue Report Classification in 4 classes: bug, documentation, feature, question
17
 
18
+ ## Usage
 
 
 
 
 
 
19
 
20
+ You can use the model like this:
21
 
22
  ```python
23
+ from sentence_transformers.losses import CosineSimilarityLoss
24
+ from setfit import SetFitModel
25
+ from setfit import SetFitTrainer
26
+ sentences = ["error in line 20", "add method list_features"]
27
+
28
+ label_mapping = {
29
+ 0 : "bug",
30
+ 1 : "documentation",
31
+ 2 : "feature",
32
+ 3 : "question"
33
+ }
 
 
 
 
 
 
 
 
 
 
34
 
35
+ model = SetFitModel.from_pretrained('PeppoCola/FewShotIssueClassifier-NLBSE23')
36
+ predictions = model.predict(sentences)
37
+ print([label_mapping[i] for i in predictions])
38
  ```
39
 
40
+ ## Dataset
41
+ This model is trained on a subset of the [NLBSE23](https://nlbse2023.github.io/tools/) dataset. The sample was hand-labeled, and made available on [Zenodo](https://zenodo.org/record/7628150#.ZBnM3XbMJD8)
42
 
43
+ ## Citing & Authors
44
 
 
45
  ```
46
+ @software{Colavito_Few-Shot_Learning_for_2023,
47
+ author = {Colavito, Giuseppe and Lanubile, Filippo and Novielli, Nicole},
48
+ month = {2},
49
+ title = {{Few-Shot Learning for Issue Report Classification}},
50
+ url = {https://github.com/collab-uniba/Issue-Report-Classification-NLBSE2023},
51
+ version = {1.0.0},
52
+ year = {2023}
 
 
 
 
 
 
53
  }
54
  ```
55
 
 
 
 
 
 
 
 
 
56
  ```
57
+ @dataset{colavito_giuseppe_2023_7628150,
58
+ author = {Colavito Giuseppe and
59
+ Lanubile Filippo and
60
+ Novielli Nicole},
61
+ title = {Few-Shot Learning for Issue Report Classification},
62
+ month = feb,
63
+ year = 2023,
64
+ note = {{To use this, merge the CSV with the original
65
+ dataset (after removing duplicates on the 'id'
66
+ column)}},
67
+ publisher = {Zenodo},
68
+ doi = {10.5281/zenodo.7628150},
69
+ url = {https://doi.org/10.5281/zenodo.7628150}
70
+ }
71
+ ```