|
--- |
|
license: llama3.2 |
|
datasets: |
|
- stanfordnlp/imdb |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- meta-llama/Llama-3.2-1B |
|
new_version: yash3056/Llama-3.2-1B-imdb |
|
pipeline_tag: text-classification |
|
library_name: transformers |
|
tags: |
|
- transformers |
|
- pytorch |
|
- llama |
|
- llama-3 |
|
- 1b |
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Funded by [Intel]:** [https://console.cloud.intel.com/] |
|
- **Shared by [optional]:** [More Information Needed] |
|
- **Model type:** Text Classification |
|
- **Language(s) (NLP):** [More Information Needed] |
|
- **License:** [Llama 3.2 Community License Agreement] |
|
- **Finetuned from model [meta-llama/Llama-3.2-1B]:** [https://huggingface.co/meta-llama/Llama-3.2-1B] |
|
|
|
## Uses |
|
|
|
This model is designed for text classification tasks, specifically for binary sentiment analysis on datasets like IMDb, where the goal is to classify text as positive or negative. It can be used by data scientists, researchers, and developers to build applications for sentiment analysis, content moderation, or customer feedback analysis. The model can be fine-tuned for other binary or multi-class classification tasks in domains like social media monitoring, product reviews, and support ticket triage. Foreseeable users include AI researchers, developers, and businesses looking to automate text analysis at scale. |
|
### Direct Use |
|
|
|
This model can be used directly to identify sentiments from text-based reviews, such as classifying whether a movie or product review is positive or negative. Without any further fine-tuning, it performs well on binary sentiment analysis tasks and can be employed out of the box for various applications like analyzing customer feedback, monitoring social media opinions, or automating sentiment tagging. The model is ideal for scenarios where sentiment needs to be quickly assessed from textual input without the need for deeper customizations. |
|
|
|
### Downstream Use |
|
|
|
*Fine-tuning for Binary Classification* |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments |
|
from datasets import load_dataset |
|
|
|
# Load IMDb dataset for binary classification |
|
dataset = load_dataset("imdb") |
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") |
|
|
|
# Tokenize the dataset |
|
def preprocess(example): |
|
return tokenizer(example['text'], truncation=True, padding='max_length', max_length=128) |
|
|
|
tokenized_datasets = dataset.map(preprocess, batched=True) |
|
|
|
# Load model for binary classification (num_labels=2) |
|
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2) |
|
|
|
# Training arguments |
|
training_args = TrainingArguments( |
|
output_dir="./results", |
|
evaluation_strategy="epoch", |
|
learning_rate=2e-5, |
|
per_device_train_batch_size=16, |
|
per_device_eval_batch_size=16, |
|
num_train_epochs=3, |
|
weight_decay=0.01, |
|
) |
|
|
|
# Trainer |
|
trainer = Trainer( |
|
model=model, |
|
args=training_args, |
|
train_dataset=tokenized_datasets["train"], |
|
eval_dataset=tokenized_datasets["test"], |
|
) |
|
|
|
# Fine-tune the model |
|
trainer.train() |
|
``` |
|
|
|
*Fine-tuning for Multi-Class Classification* |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments |
|
from datasets import load_dataset |
|
|
|
# Load AG News dataset for multi-class classification (4 labels) |
|
dataset = load_dataset("ag_news") |
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") |
|
|
|
# Tokenize the dataset |
|
def preprocess(example): |
|
return tokenizer(example['text'], truncation=True, padding='max_length', max_length=128) |
|
|
|
tokenized_datasets = dataset.map(preprocess, batched=True) |
|
|
|
# Load model for multi-class classification (num_labels=4) |
|
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=4) |
|
|
|
# Training arguments |
|
training_args = TrainingArguments( |
|
output_dir="./results", |
|
evaluation_strategy="epoch", |
|
learning_rate=2e-5, |
|
per_device_train_batch_size=16, |
|
per_device_eval_batch_size=16, |
|
num_train_epochs=3, |
|
weight_decay=0.01, |
|
) |
|
|
|
# Trainer |
|
trainer = Trainer( |
|
model=model, |
|
args=training_args, |
|
train_dataset=tokenized_datasets["train"], |
|
eval_dataset=tokenized_datasets["test"], |
|
) |
|
|
|
# Fine-tune the model |
|
trainer.train() |
|
``` |
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
[More Information Needed] |
|
--> |
|
## Bias, Risks, and Limitations |
|
|
|
While this model is effective for text classification and sentiment analysis, it has certain limitations and potential biases. The training data, such as the IMDb dataset, may contain inherent biases related to language use, cultural context, or demographics of reviewers, which could influence the model’s predictions. For example, the model might struggle with nuanced sentiment, sarcasm, or slang, leading to misclassifications. Additionally, it could exhibit biases toward particular opinions or groups if those were overrepresented or underrepresented in the training data. |
|
|
|
The model is also limited to binary sentiment classification, meaning it may oversimplify more complex emotional states expressed in text. Users should be cautious when applying the model in sensitive domains such as legal, medical, or psychological settings, where misclassification could have serious consequences. Proper review and adjustment of predictions are recommended, especially in high-stakes applications. |
|
|
|
### Recommendations |
|
|
|
Users (both direct and downstream) should be aware of the potential risks, biases, and limitations inherent in this model. Given that the model may reflect biases present in the training data, it is recommended that users critically evaluate the model’s performance on specific datasets or contexts where fairness and accuracy are essential. |
|
|
|
For applications in sensitive areas like legal, healthcare, or hiring decisions, additional care should be taken to review the model's predictions, possibly combining them with human oversight. Fine-tuning the model on domain-specific data or implementing bias mitigation techniques can help reduce unintended bias. Additionally, regular re-evaluation and monitoring of the model in production environments are encouraged to ensure it continues to meet desired ethical and performance standards. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
# Load Model and tokenizers |
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") |
|
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=n) #n is the number of labels in the code |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on the IMDb dataset, a widely used benchmark for binary sentiment classification tasks. The dataset consists of movie reviews labeled as positive or negative, making it suitable for training models to understand sentiment in text. The dataset contains 50,000 reviews in total, evenly split between positive and negative labels, providing a balanced dataset for training and evaluation. Preprocessing involved tokenizing the text using the AutoTokenizer from Hugging Face's Transformers library, truncating and padding the sequences to a maximum length of 512 tokens. The training data was further split into training and validation sets with an 80-20 ratio. |
|
|
|
More information about the IMDb dataset can be found [here](https://huggingface.co/datasets/stanfordnlp/imdb). |
|
|
|
### Training Procedure |
|
|
|
Training Procedure |
|
The training procedure used the Llama-3.2-1B model with modifications to suit the binary sentiment classification task. Training was performed for 10 epochs using a batch size of 8 and the AdamW optimizer with a learning rate of 3e-5. The learning rate was adjusted with a linear schedule, including a warmup of 40% of the total steps. The model was fine-tuned using the IMDb training dataset and evaluated on a separate test set. |
|
|
|
Validation and evaluation metrics were calculated after each epoch, including accuracy, precision, recall, F1-score, and ROC-AUC. The final model was saved after the last epoch, along with the tokenizer. Several plots, such as loss curves, accuracy curves, confusion matrix, and ROC curve, were generated to visually assess the model's performance. |
|
#### Preprocessing [optional] |
|
|
|
Text data was preprocessed by tokenizing with the Llama-3.2-1B model tokenizer. Sequences were truncated and padded to a maximum length of 512 tokens to ensure consistent input sizes for the model. Labels were encoded as integers (0 for negative and 1 for positive) for compatibility with the model. |
|
|
|
<!-- |
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
<!-- |
|
#### Speeds, Sizes, Times [optional] |
|
|
|
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
|
<!-- |
|
[More Information Needed] |
|
--> |
|
## Evaluation |
|
|
|
Training Loss: 0.0030, Accuracy: 0.9999 |
|
Validation Loss: 0.1196, Accuracy: 0.9628 |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
Test Loss: 0.1315 |
|
Test Accuracy: 0.9604 |
|
Precision: 0.9604 |
|
Recall: 0.9604 |
|
F1-score: 0.9604 |
|
AUC: 0.9604 |
|
<!-- |
|
#### Factors |
|
|
|
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> |
|
<!-- |
|
|
|
[More Information Needed] |
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
<!-- |
|
[More Information Needed] |
|
|
|
### Results |
|
|
|
[More Information Needed]--> |
|
|
|
#### Summary |
|
<!-- |
|
## Model Examination [optional] |
|
|
|
<!-- Relevant interpretability work for the model goes here --> |
|
<!-- |
|
[More Information Needed] |
|
<!-- |
|
## Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
<!-- |
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** [More Information Needed] |
|
- **Hours used:** [More Information Needed] |
|
- **Cloud Provider:** [More Information Needed] |
|
- **Compute Region:** [More Information Needed] |
|
- **Carbon Emitted:** [More Information Needed] |
|
--> |
|
|
|
## Technical Specifications |
|
<!-- |
|
### Model Architecture and Objective |
|
|
|
[More Information Needed] |
|
|
|
### Compute Infrastructure |
|
|
|
[More Information Needed] |
|
--> |
|
#### Hardware |
|
|
|
[Intel® Data Center GPU Max 1550](https://www.intel.com/content/www/us/en/products/sku/232873/intel-data-center-gpu-max-1550/specifications.html) |
|
<!-- |
|
|
|
## Citation [optional] |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
<!-- |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |
|
|
|
**APA:** |
|
|
|
[More Information Needed] |
|
--> |
|
## Model Card Authors |
|
|
|
-Yash Prakash Narayan ([github](https://github.com/yash3056)) |
|
<!-- |
|
|
|
## Model Card Contact |
|
|
|
[More Information Needed]--> |