quim-motger's picture
Update README.md
c484190 verified
|
raw
history blame
4.91 kB
metadata
license: gpl-3.0
language:
  - en
tags:
  - feature extraction
  - mobile apps
  - reviews
  - token classification
  - named entity recognition
pipeline_tag: token-classification
widget:
  - text: The share note file feature is completely useless.
    example_title: Example 1
  - text: >-
      Great app I've tested a lot of free habit tracking apps and this is by far
      my favorite.
    example_title: Example 2
  - text: >-
      The only negative feedback I can give about this app is the difficulty
      level to set a sleep timer on it.
    example_title: Example 3
  - text: Does what you want with a small pocket size checklist reminder app
    example_title: Example 4
  - text: Very bad because call recording notification send other person
    example_title: Example 5
  - text: >-
      I originally downloaded the app for pomodoro timing, but I stayed for the
      project management features, with syncing.
    example_title: Example 6
  - text: >-
      It works accurate and I bought a portable one lap gps tracker it have a
      great battery Life
    example_title: Example 7
  - text: >-
      I'm my phone the notifications of group message are not at a time please
      check what was the reason behind it because due to this default I loose
      some opportunity
    example_title: Example 8
  - text: There is no setting for recurring alarms
    example_title: Example 9

T-FREX RoBERTa base model


Please cite this research as:

Q. Motger, A. Miaschi, F. Dell’Orletta, X. Franch, and J. Marco, ‘T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews’, in Proceedings of The IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2024. Pre-print available at: https://arxiv.org/abs/2401.03833


T-FREX is a transformer-based feature extraction method for mobile app reviews based on fine-tuning Large Language Models (LLMs) for a named entity recognition task. We collect a dataset of ground truth features from users in a real crowdsourced software recommendation platform, and we use this dataset to fine-tune multiple LLMs under different data configurations. We assess the performance of T-FREX with respect to this ground truth, and we complement our analysis by comparing T-FREX with a baseline method from the field. Finally, we assess the quality of new features predicted by T-FREX through an external human evaluation. Results show that T-FREX outperforms on average the traditional syntactic-based method, especially when discovering new features from a domain for which the model has been fine-tuned.

Source code for data generation, fine-tuning and model inference are available in the original GitHub repository.

Model description

This version of T-FREX has been fine-tuned for token classification from XLNet large model.

Model variations

T-FREX includes a set of released, fine-tuned models which are compared in the original study (pre-print available at http://arxiv.org/abs/2401.03833).

How to use

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Load the pre-trained model and tokenizer
model_name = "quim-motger/t-frex-xlnet-large-cased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Create a pipeline for named entity recognition
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

# Example text
text = "The share note file feature is completely useless."

# Perform named entity recognition
entities = ner_pipeline(text)

# Print the recognized entities
for entity in entities:
    print(f"Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']:.4f}")

# Example with multiple texts
texts = [
    "Great app I've tested a lot of free habit tracking apps and this is by far my favorite.",
    "The only negative feedback I can give about this app is the difficulty level to set a sleep timer on it."
]

# Perform named entity recognition on multiple texts
for text in texts:
    entities = ner_pipeline(text)
    print(f"Text: {text}")
    for entity in entities:
        print(f"  Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']:.4f}")