Fix: update example in readme

2645f7a verified about 1 month ago

11.3 kB

	---
	license: llama3.2
	datasets:
	- stanfordnlp/imdb
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- meta-llama/Llama-3.2-1B
	new_version: yash3056/Llama-3.2-1B-imdb
	pipeline_tag: text-classification
	library_name: transformers
	tags:
	- transformers
	- pytorch
	- llama
	- llama-3
	- 1b
	---

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Funded by [Intel]: [https://console.cloud.intel.com/]
	- Shared by [optional]: [More Information Needed]
	- Model type: Text Classification
	- Language(s) (NLP): [More Information Needed]
	- License: [Llama 3.2 Community License Agreement]
	- Finetuned from model [meta-llama/Llama-3.2-1B]: [https://huggingface.co/meta-llama/Llama-3.2-1B]

	## Uses

	This model is designed for text classification tasks, specifically for binary sentiment analysis on datasets like IMDb, where the goal is to classify text as positive or negative. It can be used by data scientists, researchers, and developers to build applications for sentiment analysis, content moderation, or customer feedback analysis. The model can be fine-tuned for other binary or multi-class classification tasks in domains like social media monitoring, product reviews, and support ticket triage. Foreseeable users include AI researchers, developers, and businesses looking to automate text analysis at scale.
	### Direct Use

	This model can be used directly to identify sentiments from text-based reviews, such as classifying whether a movie or product review is positive or negative. Without any further fine-tuning, it performs well on binary sentiment analysis tasks and can be employed out of the box for various applications like analyzing customer feedback, monitoring social media opinions, or automating sentiment tagging. The model is ideal for scenarios where sentiment needs to be quickly assessed from textual input without the need for deeper customizations.

	### Downstream Use

	Fine-tuning for Binary Classification
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
	from datasets import load_dataset

	# Load IMDb dataset for binary classification
	dataset = load_dataset("imdb")
	tokenizer = AutoTokenizer.from_pretrained("yash3056/Llama-3.2-1B-imdb")

	# Tokenize the dataset
	def preprocess(example):
	return tokenizer(example['text'], truncation=True, padding='max_length', max_length=128)

	tokenized_datasets = dataset.map(preprocess, batched=True)

	# Load model for binary classification (num_labels=2)
	model = AutoModelForSequenceClassification.from_pretrained("yash3056/Llama-3.2-1B-imdb", num_labels=2)

	# Training arguments
	training_args = TrainingArguments(
	output_dir="./results",
	evaluation_strategy="epoch",
	learning_rate=2e-5,
	per_device_train_batch_size=16,
	per_device_eval_batch_size=16,
	num_train_epochs=3,
	weight_decay=0.01,
	)

	# Trainer
	trainer = Trainer(
	model=model,
	args=training_args,
	train_dataset=tokenized_datasets["train"],
	eval_dataset=tokenized_datasets["test"],
	)

	# Fine-tune the model
	trainer.train()
	```

	Fine-tuning for Multi-Class Classification

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
	from datasets import load_dataset

	# Load AG News dataset for multi-class classification (4 labels)
	dataset = load_dataset("ag_news")
	tokenizer = AutoTokenizer.from_pretrained("yash3056/Llama-3.2-1B-imdb")

	# Tokenize the dataset
	def preprocess(example):
	return tokenizer(example['text'], truncation=True, padding='max_length', max_length=128)

	tokenized_datasets = dataset.map(preprocess, batched=True)

	# Load model for multi-class classification (num_labels=4)
	model = AutoModelForSequenceClassification.from_pretrained("yash3056/Llama-3.2-1B-imdb", num_labels=4)

	# Training arguments
	training_args = TrainingArguments(
	output_dir="./results",
	evaluation_strategy="epoch",
	learning_rate=2e-5,
	per_device_train_batch_size=16,
	per_device_eval_batch_size=16,
	num_train_epochs=3,
	weight_decay=0.01,
	)

	# Trainer
	trainer = Trainer(
	model=model,
	args=training_args,
	train_dataset=tokenized_datasets["train"],
	eval_dataset=tokenized_datasets["test"],
	)

	# Fine-tune the model
	trainer.train()
	```
	<!--
	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	[More Information Needed]
	-->
	## Bias, Risks, and Limitations

	While this model is effective for text classification and sentiment analysis, it has certain limitations and potential biases. The training data, such as the IMDb dataset, may contain inherent biases related to language use, cultural context, or demographics of reviewers, which could influence the model’s predictions. For example, the model might struggle with nuanced sentiment, sarcasm, or slang, leading to misclassifications. Additionally, it could exhibit biases toward particular opinions or groups if those were overrepresented or underrepresented in the training data.

	The model is also limited to binary sentiment classification, meaning it may oversimplify more complex emotional states expressed in text. Users should be cautious when applying the model in sensitive domains such as legal, medical, or psychological settings, where misclassification could have serious consequences. Proper review and adjustment of predictions are recommended, especially in high-stakes applications.

	### Recommendations

	Users (both direct and downstream) should be aware of the potential risks, biases, and limitations inherent in this model. Given that the model may reflect biases present in the training data, it is recommended that users critically evaluate the model’s performance on specific datasets or contexts where fairness and accuracy are essential.

	For applications in sensitive areas like legal, healthcare, or hiring decisions, additional care should be taken to review the model's predictions, possibly combining them with human oversight. Fine-tuning the model on domain-specific data or implementing bias mitigation techniques can help reduce unintended bias. Additionally, regular re-evaluation and monitoring of the model in production environments are encouraged to ensure it continues to meet desired ethical and performance standards.

	## How to Get Started with the Model

	Use the code below to get started with the model.
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load Model and tokenizers
	tokenizer = AutoTokenizer.from_pretrained("yash3056/Llama-3.2-1B-imdb")
	model = AutoModelForSequenceClassification.from_pretrained("yash3056/Llama-3.2-1B-imdb", num_labels=n) #n is the number of labels in the code
	```


	## Training Details

	### Training Data

	The model was trained on the IMDb dataset, a widely used benchmark for binary sentiment classification tasks. The dataset consists of movie reviews labeled as positive or negative, making it suitable for training models to understand sentiment in text. The dataset contains 50,000 reviews in total, evenly split between positive and negative labels, providing a balanced dataset for training and evaluation. Preprocessing involved tokenizing the text using the AutoTokenizer from Hugging Face's Transformers library, truncating and padding the sequences to a maximum length of 512 tokens. The training data was further split into training and validation sets with an 80-20 ratio.

	More information about the IMDb dataset can be found [here](https://huggingface.co/datasets/stanfordnlp/imdb).

	### Training Procedure

	Training Procedure
	The training procedure used the Llama-3.2-1B model with modifications to suit the binary sentiment classification task. Training was performed for 10 epochs using a batch size of 8 and the AdamW optimizer with a learning rate of 3e-5. The learning rate was adjusted with a linear schedule, including a warmup of 40% of the total steps. The model was fine-tuned using the IMDb training dataset and evaluated on a separate test set.

	Validation and evaluation metrics were calculated after each epoch, including accuracy, precision, recall, F1-score, and ROC-AUC. The final model was saved after the last epoch, along with the tokenizer. Several plots, such as loss curves, accuracy curves, confusion matrix, and ROC curve, were generated to visually assess the model's performance.
	#### Preprocessing [optional]

	Text data was preprocessed by tokenizing with the Llama-3.2-1B model tokenizer. Sequences were truncated and padded to a maximum length of 512 tokens to ensure consistent input sizes for the model. Labels were encoded as integers (0 for negative and 1 for positive) for compatibility with the model.

	<!--
	#### Training Hyperparameters

	- Training regime: [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
	<!--
	#### Speeds, Sizes, Times [optional]

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
	<!--
	[More Information Needed]
	-->
	## Evaluation

	Training Loss: 0.0030, Accuracy: 0.9999
	Validation Loss: 0.1196, Accuracy: 0.9628

	### Testing Data, Factors & Metrics

	#### Testing Data

	Test Loss: 0.1315
	Test Accuracy: 0.9604
	Precision: 0.9604
	Recall: 0.9604
	F1-score: 0.9604
	AUC: 0.9604
	<!--
	#### Factors

	<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
	<!--

	[More Information Needed]
	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->
	<!--
	[More Information Needed]

	### Results

	[More Information Needed]-->

	#### Summary
	<!--
	## Model Examination [optional]

	<!-- Relevant interpretability work for the model goes here -->
	<!--
	[More Information Needed]
	<!--
	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
	<!--
	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: [More Information Needed]
	- Hours used: [More Information Needed]
	- Cloud Provider: [More Information Needed]
	- Compute Region: [More Information Needed]
	- Carbon Emitted: [More Information Needed]
	-->

	## Technical Specifications
	<!--
	### Model Architecture and Objective

	[More Information Needed]

	### Compute Infrastructure

	[More Information Needed]
	-->
	#### Hardware

	[Intel® Data Center GPU Max 1550](https://www.intel.com/content/www/us/en/products/sku/232873/intel-data-center-gpu-max-1550/specifications.html)
	<!--

	## Citation [optional]

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
	<!--

	BibTeX:

	[More Information Needed]

	APA:

	[More Information Needed]
	-->
	## Model Card Authors

	-Yash Prakash Narayan ([github](https://github.com/yash3056))
	<!--

	## Model Card Contact

	[More Information Needed]-->