Update README.md

5c6125f verified 4 months ago

5.02 kB

	---
	license: gpl-3.0
	language:
	- en
	tags:
	- feature extraction
	- mobile apps
	- reviews
	- token classification
	- named entity recognition
	pipeline_tag: token-classification
	widget:
	- text: "The share note file feature is completely useless."
	example_title: "Example 1"
	- text: "Great app I've tested a lot of free habit tracking apps and this is by far my favorite."
	example_title: "Example 2"
	- text: "The only negative feedback I can give about this app is the difficulty level to set a sleep timer on it."
	example_title: "Example 3"
	- text: "Does what you want with a small pocket size checklist reminder app"
	example_title: "Example 4"
	- text: "Very bad because call recording notification send other person"
	example_title: "Example 5"
	- text: "I originally downloaded the app for pomodoro timing, but I stayed for the project management features, with syncing."
	example_title: "Example 6"
	- text: "It works accurate and I bought a portable one lap gps tracker it have a great battery Life"
	example_title: "Example 7"
	- text: "I'm my phone the notifications of group message are not at a time please check what was the reason behind it because due to this default I loose some opportunity"
	example_title: "Example 8"
	- text: "There is no setting for recurring alarms"
	example_title: "Example 9"
	---

	# T-FREX RoBERTa base model

	---
	Please cite this research as:

	_Q. Motger, A. Miaschi, F. Dell’Orletta, X. Franch, and J. Marco, ‘T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews’, in Proceedings of The IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2024. Pre-print available at: https://arxiv.org/abs/2401.03833_

	---

	T-FREX is a transformer-based feature extraction method for mobile app reviews based on fine-tuning Large Language Models (LLMs) for a named entity recognition task. We collect a dataset of ground truth features from users in a real crowdsourced software recommendation platform, and we use this dataset to fine-tune multiple LLMs under different data configurations. We assess the performance of T-FREX with respect to this ground truth, and we complement our analysis by comparing T-FREX with a baseline method from the field. Finally, we assess the quality of new features predicted by T-FREX through an external human evaluation. Results show that T-FREX outperforms on average the traditional syntactic-based method, especially when discovering new features from a domain for which the model has been fine-tuned.

	Source code for data generation, fine-tuning and model inference are available in the original [GitHub repository](https://github.com/gessi-chatbots/t-frex/).

	## Model description

	This version of T-FREX has been fine-tuned for [token classification](https://huggingface.co/docs/transformers/tasks/token_classification#train) from [RoBERTa base model](https://huggingface.co/roberta-base).

	## Model variations

	T-FREX includes a set of released, fine-tuned models which are compared in the original study (pre-print available at http://arxiv.org/abs/2401.03833).

	- [t-frex-bert-base-uncased](https://huggingface.co/quim-motger/t-frex-bert-base-uncased)
	- [t-frex-bert-large-uncased](https://huggingface.co/quim-motger/t-frex-bert-large-uncased)
	- [t-frex-roberta-base](https://huggingface.co/quim-motger/t-frex-roberta-base)
	- [t-frex-roberta-large](https://huggingface.co/quim-motger/t-frex-roberta-large)
	- [t-frex-xlnet-base-cased](https://huggingface.co/quim-motger/t-frex-xlnet-base-cased)
	- [t-frex-xlnet-large-cased](https://huggingface.co/quim-motger/t-frex-xlnet-large-cased)

	## How to use

	Below are code snippets to demonstrate how to use the T-FREX RoBERTa base model for named entity recognition on app reviews:

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

	# Load the pre-trained model and tokenizer
	model_name = "quim-motger/t-frex-roberta-base"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)

	# Create a pipeline for named entity recognition
	ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

	# Example text
	text = "The share note file feature is completely useless."

	# Perform named entity recognition
	entities = ner_pipeline(text)

	# Print the recognized entities
	for entity in entities:
	print(f"Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']:.4f}")

	# Example with multiple texts
	texts = [
	"Great app I've tested a lot of free habit tracking apps and this is by far my favorite.",
	"The only negative feedback I can give about this app is the difficulty level to set a sleep timer on it."
	]

	# Perform named entity recognition on multiple texts
	for text in texts:
	entities = ner_pipeline(text)
	print(f"Text: {text}")
	for entity in entities:
	print(f" Entity: {entity['word']}, Label: {entity['entity']}, Score: {entity['score']:.4f}")

	```