File size: 3,640 Bytes
1e2bedb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# π¬ Movie Review Sentiment Analysis - Fine-Tuned BERT Model
This repository hosts a fine-tuned **BERT-based** model optimized for **sentiment analysis** on movie reviews using the **IMDb dataset**. The model classifies movie reviews as either **Positive** or **Negative** with high accuracy.
## π Model Details
- **Model Architecture**: BERT
- **Task**: Sentiment Analysis
- **Dataset**: [IMDb Movie Reviews]
- **Fine-tuning Framework**: Hugging Face Transformers
- **Quantization**: Float16
## π Usage
### Installation
```bash
pip install transformers torch
```
### Loading the Model
```python
from transformers import BertTokenizer, BertForSequenceClassification
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "AventIQ-AI/bert-movie-review-sentiment-analysis"
model = BertForSequenceClassification.from_pretrained(model_name).to(device)
tokenizer = BertTokenizer.from_pretrained(model_name)
```
### Sentiment Prediction
```python
import torch
import torch.nn.functional as F
def predict_sentiment(review_text):
model.eval() # Set model to evaluation mode
inputs = tokenizer(review_text, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = F.softmax(logits, dim=1) # Convert logits to probabilities
confidence, prediction = torch.max(probs, dim=1) # Get class with highest probability
sentiment = "Positive π" if prediction.item() == 1 else "Negative π"
# Print probabilities for debugging
print(f"Softmax Probabilities: {probs.tolist()}")
# **Force correction for low confidence negative reviews**
if confidence.item() < 0.7 and "not good" in review_text.lower():
sentiment = "Negative π"
return sentiment
# πΉ **Test with Your Review**
review = "The movie was filled with boring dailogues and unrealistic action."
result = predict_sentiment(review)
print(f"Review: {review}")
print(f"Predicted Sentiment: {result}")
```
## π Evaluation Results
After fine-tuning, the model was evaluated on the IMDb dataset, achieving the following performance:
| Metric | Score | Meaning |
|----------|--------|------------------------------------------------|
| **Accuracy** | 92.5% | Percentage of correctly classified reviews |
| **F1 Score** | 91.8% | Balance between precision and recall |
## π§ Fine-Tuning Details
### Dataset
The **IMDb Movie Reviews** dataset was used for training and evaluation. The dataset consists of **25,000** labeled movie reviews (positive/negative).
### Training Configuration
- **Number of epochs**: 10
- **Batch size**: 32
- **Optimizer**: AdamW
- **Learning rate**: 3e-5
- **Evaluation strategy**: Epoch-based
### Quantization
The model was quantized using **float16** for inference, reducing latency and memory usage while maintaining accuracy.
## π Repository Structure
```bash
.
βββ model/ # Contains the fine-tuned model files
βββ tokenizer_config/ # Tokenizer configuration and vocabulary files
βββ model.safetensors/ # Quantized Model
βββ README.md # Model documentation
```
## β οΈ Limitations
- The model may struggle with **sarcasm and nuanced sentiments**.
- Performance may vary across **different writing styles** and **review lengths**.
- **Quantization** may slightly affect accuracy compared to the full-precision model.
## π€ Contributing
Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.
---
|