metadata

language:
  - en
metrics:
  - accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
  - code
  - linguistic antipatterns
  - python

alBERTo

alBERTo model pre-trained for the classification of linguistic antipatterns on a dataset containing instances of these bad practices of type: "Get" more than accessor, Not implemented condition, Method signature and comment are opposite, Attribute signature and comment are opposite

Model Description

alBERTo is a model created for the recognition of linguistic antipatterns within python code. It was created starting from the Microsoft CodeBERT model, on which fine tuning operations were carried out to make it capable of classifying the code as "clean" or containing linguistic antipatterns. The model is able to classify different classes:

"Get" more than accessor: A getter that performs actions other than returning the corresponding attribute.
Not implemented condition: The comments of a method suggest a conditional behavior that is not implemented in the code. When the implementation is default this should be documented.
Method signature and comment are opposite: The name of an attribute suggests a single instance, while its type suggests that the attribute stores a collection of objects.
Attribute signature and comment are opposite: The declaration of an attribute is in contradiction with its documentation. It has been trained on a dataset containing instances of the linguistic antipatterns taken into consideration and of clean code, so as to be able to classify the code in the most precise way possible.

Intended uses & limitations

this model can be used for the classification of linguistic antipatters described previously. The model still has limitations, as it makes classification errors due to the presence of little data for training, therefore its predictions should not be taken as absolute or true regardless

Usage

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('alBERTo')
model = AutoModelForSequenceClassification.from_pretrained("alBERTo")

# prepare input
text = """
  """"""
    create a new object
  """"""
  def destroy_object():
"""
encoded_input = tokenizer(text, return_tensors='pt')

# forward pass
output = model(**encoded_input)