Edit model card

Improving the detection of technical debt in Java source code with an enriched dataset

Model Details

Model Description

This model is the part of Tesoro project, used for detecting technical debt in source code. More information can be found at Tesoro HomePage.

  • Developed by: Nam Hai Le
  • Model type: Decoder-based PLMs
  • Language(s): Java
  • Finetuned from model: Codellama

Model Sources

  • Repository: Tesoro
  • Paper: [To be update]

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("NamCyan/CodeLlama-7b-technical-debt-code-tesoro")
model = AutoModelForSequenceClassification.from_pretrained("NamCyan/CodeLlama-7b-technical-debt-code-tesoro")

Training Details

  • Training Data: The model is finetuned using tesoro-code

  • Infrastructure: Training process is conducted on two NVIDIA A100 GPUs with 80GB of VRAM. LoRa is adopted to train this model.

Leaderboard

Model Model size EM F1
Encoder-based PLMs
CodeBERT 125M 38.28 43.47
UniXCoder 125M 38.12 42.58
GraphCodeBERT 125M 39.38 44.21
RoBERTa 125M 35.37 38.22
ALBERT 11.8M 39.32 41.99
Encoder-Decoder-based PLMs
PLBART 140M 36.85 39.90
Codet5 220M 32.66 35.41
CodeT5+ 220M 37.91 41.96
Decoder-based PLMs (LLMs)
TinyLlama 1.03B 37.05 40.05
DeepSeek-Coder 1.28B 42.52 46.19
OpenCodeInterpreter 1.35B 38.16 41.76
phi-2 2.78B 37.92 41.57
starcoder2 3.03B 35.37 41.77
CodeLlama 6.74B 34.14 38.16
Magicoder 6.74B 39.14 42.49

Citing us

@article{nam2024tesoro,
  title={Improving the detection of technical debt in Java source code with an enriched dataset},
  author={Hai, Nam Le and Bui, Anh M. T. Bui and Nguyen, Phuong T. and Ruscio, Davide Di and Kazman, Rick},
  journal={},
  year={2024}
}
Downloads last month
4
Safetensors
Model size
6.61B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for NamCyan/CodeLlama-7b-technical-debt-code-tesoro

Finetuned
(50)
this model

Dataset used to train NamCyan/CodeLlama-7b-technical-debt-code-tesoro

Collection including NamCyan/CodeLlama-7b-technical-debt-code-tesoro