ichsanbudiman's picture
Update README.md
2a3b6ae verified
# Design Pattern Detection Model
This model detects software design patterns in Java source code using CodeBERT. The model has been fine-tuned for single-label classification tasks and supports the following design pattern labels:
## Supported Labels
| Label ID | Design Pattern |
|----------|--------------------|
| 0 | Observer |
| 1 | Decorator |
| 2 | Adapter |
| 3 | Proxy |
| 4 | Singleton |
| 5 | Facade |
| 6 | AbstractFactory |
| 7 | Memento |
| 8 | FactoryMethod |
| 9 | Prototype |
| 10 | Visitor |
| 11 | Builder |
| 12 | Unknown |
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ichsanbudiman/design-pattern-detection-codebert")
model = AutoModelForSequenceClassification.from_pretrained("ichsanbudiman/design-pattern-detection-codebert")
# Example input
input_code = """
public class Singleton {
private static Singleton instance;
private Singleton() {}
public static Singleton getInstance() {
if (instance == null) {
instance = new Singleton();
}
return instance;
}
}
"""
# Tokenize the input
inputs = tokenizer(input_code, return_tensors="pt", padding="max_length", truncation=True, max_length=512)
# Make predictions
with torch.no_grad():
outputs = model(**inputs)
# Get the predicted class and label
predicted_class = torch.argmax(outputs.logits, dim=1).item()
predicted_label = model.config.id2label[predicted_class]
print(f"Predicted label: {predicted_label}")
```
## Input Requirements
- **Input Format**: Java code snippets as strings.
- **Max Length**: Input code longer than 512 tokens will be truncated.
- **Padding**: Automatically pads to 512 tokens for batch processing.
## Task
This model performs single-label classification for the detection of design patterns in Java source code. The supported design patterns are listed above.
## Fine-Tuning Details
- **Base Model**: [CodeBERT](https://huggingface.co/microsoft/codebert-base)
- **Dataset**: Fine-tuned on a curated dataset of labeled Java code examples. The dataset was sourced from the following research article:
> Najam Nazar, Aldeida Aleti, Yaokun Zheng, Feature-based software design pattern detection, Journal of Systems and Software, Volume 185, 2022, 111179, ISSN 0164-1212, [https://doi.org/10.1016/j.jss.2021.111179](https://doi.org/10.1016/j.jss.2021.111179).
- **Metrics**: The model achieves high accuracy on detecting design patterns, making it suitable for software engineering tasks.
## Contact
For inquiries or feedback, please reach out to [Ichsan Budiman](mailto:[email protected]).
## License
This model is licensed under the Apache 2.0 License.