|
# π¬ Movie Review Sentiment Analysis - Fine-Tuned BERT Model |
|
|
|
This repository hosts a fine-tuned **BERT-based** model optimized for **sentiment analysis** on movie reviews using the **IMDb dataset**. The model classifies movie reviews as either **Positive** or **Negative** with high accuracy. |
|
|
|
## π Model Details |
|
- **Model Architecture**: BERT |
|
- **Task**: Sentiment Analysis |
|
- **Dataset**: [IMDb Movie Reviews] |
|
- **Fine-tuning Framework**: Hugging Face Transformers |
|
- **Quantization**: Float16 |
|
|
|
## π Usage |
|
|
|
### Installation |
|
```bash |
|
pip install transformers torch |
|
``` |
|
|
|
### Loading the Model |
|
```python |
|
from transformers import BertTokenizer, BertForSequenceClassification |
|
import torch |
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
model_name = "AventIQ-AI/bert-movie-review-sentiment-analysis" |
|
model = BertForSequenceClassification.from_pretrained(model_name).to(device) |
|
tokenizer = BertTokenizer.from_pretrained(model_name) |
|
``` |
|
|
|
### Sentiment Prediction |
|
```python |
|
import torch |
|
import torch.nn.functional as F |
|
|
|
def predict_sentiment(review_text): |
|
model.eval() # Set model to evaluation mode |
|
inputs = tokenizer(review_text, padding=True, truncation=True, max_length=512, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
probs = F.softmax(logits, dim=1) # Convert logits to probabilities |
|
confidence, prediction = torch.max(probs, dim=1) # Get class with highest probability |
|
|
|
sentiment = "Positive π" if prediction.item() == 1 else "Negative π" |
|
|
|
# Print probabilities for debugging |
|
print(f"Softmax Probabilities: {probs.tolist()}") |
|
|
|
# **Force correction for low confidence negative reviews** |
|
if confidence.item() < 0.7 and "not good" in review_text.lower(): |
|
sentiment = "Negative π" |
|
|
|
return sentiment |
|
|
|
# πΉ **Test with Your Review** |
|
review = "The movie was filled with boring dailogues and unrealistic action." |
|
result = predict_sentiment(review) |
|
|
|
print(f"Review: {review}") |
|
print(f"Predicted Sentiment: {result}") |
|
``` |
|
|
|
## π Evaluation Results |
|
After fine-tuning, the model was evaluated on the IMDb dataset, achieving the following performance: |
|
|
|
| Metric | Score | Meaning | |
|
|----------|--------|------------------------------------------------| |
|
| **Accuracy** | 92.5% | Percentage of correctly classified reviews | |
|
| **F1 Score** | 91.8% | Balance between precision and recall | |
|
|
|
## π§ Fine-Tuning Details |
|
|
|
### Dataset |
|
The **IMDb Movie Reviews** dataset was used for training and evaluation. The dataset consists of **25,000** labeled movie reviews (positive/negative). |
|
|
|
### Training Configuration |
|
- **Number of epochs**: 10 |
|
- **Batch size**: 32 |
|
- **Optimizer**: AdamW |
|
- **Learning rate**: 3e-5 |
|
- **Evaluation strategy**: Epoch-based |
|
|
|
### Quantization |
|
The model was quantized using **float16** for inference, reducing latency and memory usage while maintaining accuracy. |
|
|
|
## π Repository Structure |
|
```bash |
|
. |
|
βββ model/ # Contains the fine-tuned model files |
|
βββ tokenizer_config/ # Tokenizer configuration and vocabulary files |
|
βββ model.safetensors/ # Quantized Model |
|
βββ README.md # Model documentation |
|
``` |
|
|
|
## β οΈ Limitations |
|
- The model may struggle with **sarcasm and nuanced sentiments**. |
|
- Performance may vary across **different writing styles** and **review lengths**. |
|
- **Quantization** may slightly affect accuracy compared to the full-precision model. |
|
|
|
## π€ Contributing |
|
Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements. |
|
|
|
--- |
|
|