File size: 2,032 Bytes
2bd9ad7
 
 
 
 
 
 
 
 
1b8f032
 
 
2bd9ad7
 
 
1b8f032
 
 
 
 
 
 
2bd9ad7
 
1b8f032
2bd9ad7
1b8f032
2bd9ad7
1b8f032
2bd9ad7
 
1b8f032
2bd9ad7
1b8f032
2bd9ad7
 
1b8f032
2bd9ad7
1b8f032
2bd9ad7
1b8f032
 
 
2bd9ad7
1b8f032
2bd9ad7
1b8f032
2bd9ad7
 
1b8f032
2bd9ad7
 
1b8f032
2bd9ad7
 
 
 
 
 
 
1b8f032
2bd9ad7
 
1b8f032
2bd9ad7
 
1b8f032
 
 
 
 
 
 
2bd9ad7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
tags:
- setfit
- sentence-transformers
- text-classification
- generated_from_setfit_trainer
widget: []
metrics:
- accuracy
- f1
- precision
- recall
pipeline_tag: text-classification
library_name: setfit
inference: true
license: mit
datasets:
- NLBSE/nlbse25-code-comment-classification
language:
- en
base_model:
- sentence-transformers/all-MiniLM-L6-v2
---

# Python comment classifier

This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Python code comment classification.

The model has been trained using few-shot learning that involves:

1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
2. Training a classification head with features from the fine-tuned model.

## Model Description

- **Model Type:** SetFit
- **Classification head:** [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

## Sources

- **Repository:** [GitHub](https://github.com/fabiancpl/sbert-comment-classification/)
- **Paper:** [Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification](https://ieeexplore.ieee.org/document/11029440)
- **Dataset:** [HF Dataset](https://huggingface.co/datasets/NLBSE/nlbse25-code-comment-classification)

## How to use it

First, install the depencies:

```bash
pip install setfit scikit-learn
```

Then, load the model and run inferences:

```python
from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("fabiancpl/nlbse25_python")
# Run inference
preds = model("This function sorts a list of numbers.")
```

## Cite as

```bibtex
@inproceedings{11029440,
  author={Peña, Fabian C. and Herbold, Steffen},
  booktitle={2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)}, 
  title={Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification}, 
  year={2025},
  pages={21-24},
  doi={10.1109/NLBSE66842.2025.00010}}
```