varshamishra commited on
Commit
1e2bedb
Β·
verified Β·
1 Parent(s): 2b8b0fe

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎬 Movie Review Sentiment Analysis - Fine-Tuned BERT Model
2
+
3
+ This repository hosts a fine-tuned **BERT-based** model optimized for **sentiment analysis** on movie reviews using the **IMDb dataset**. The model classifies movie reviews as either **Positive** or **Negative** with high accuracy.
4
+
5
+ ## πŸ“Œ Model Details
6
+ - **Model Architecture**: BERT
7
+ - **Task**: Sentiment Analysis
8
+ - **Dataset**: [IMDb Movie Reviews]
9
+ - **Fine-tuning Framework**: Hugging Face Transformers
10
+ - **Quantization**: Float16
11
+
12
+ ## πŸš€ Usage
13
+
14
+ ### Installation
15
+ ```bash
16
+ pip install transformers torch
17
+ ```
18
+
19
+ ### Loading the Model
20
+ ```python
21
+ from transformers import BertTokenizer, BertForSequenceClassification
22
+ import torch
23
+
24
+ device = "cuda" if torch.cuda.is_available() else "cpu"
25
+
26
+ model_name = "AventIQ-AI/bert-movie-review-sentiment-analysis"
27
+ model = BertForSequenceClassification.from_pretrained(model_name).to(device)
28
+ tokenizer = BertTokenizer.from_pretrained(model_name)
29
+ ```
30
+
31
+ ### Sentiment Prediction
32
+ ```python
33
+ import torch
34
+ import torch.nn.functional as F
35
+
36
+ def predict_sentiment(review_text):
37
+ model.eval() # Set model to evaluation mode
38
+ inputs = tokenizer(review_text, padding=True, truncation=True, max_length=512, return_tensors="pt")
39
+
40
+ with torch.no_grad():
41
+ outputs = model(**inputs)
42
+ logits = outputs.logits
43
+ probs = F.softmax(logits, dim=1) # Convert logits to probabilities
44
+ confidence, prediction = torch.max(probs, dim=1) # Get class with highest probability
45
+
46
+ sentiment = "Positive 😊" if prediction.item() == 1 else "Negative 😞"
47
+
48
+ # Print probabilities for debugging
49
+ print(f"Softmax Probabilities: {probs.tolist()}")
50
+
51
+ # **Force correction for low confidence negative reviews**
52
+ if confidence.item() < 0.7 and "not good" in review_text.lower():
53
+ sentiment = "Negative 😞"
54
+
55
+ return sentiment
56
+
57
+ # πŸ”Ή **Test with Your Review**
58
+ review = "The movie was filled with boring dailogues and unrealistic action."
59
+ result = predict_sentiment(review)
60
+
61
+ print(f"Review: {review}")
62
+ print(f"Predicted Sentiment: {result}")
63
+ ```
64
+
65
+ ## πŸ“Š Evaluation Results
66
+ After fine-tuning, the model was evaluated on the IMDb dataset, achieving the following performance:
67
+
68
+ | Metric | Score | Meaning |
69
+ |----------|--------|------------------------------------------------|
70
+ | **Accuracy** | 92.5% | Percentage of correctly classified reviews |
71
+ | **F1 Score** | 91.8% | Balance between precision and recall |
72
+
73
+ ## πŸ”§ Fine-Tuning Details
74
+
75
+ ### Dataset
76
+ The **IMDb Movie Reviews** dataset was used for training and evaluation. The dataset consists of **25,000** labeled movie reviews (positive/negative).
77
+
78
+ ### Training Configuration
79
+ - **Number of epochs**: 10
80
+ - **Batch size**: 32
81
+ - **Optimizer**: AdamW
82
+ - **Learning rate**: 3e-5
83
+ - **Evaluation strategy**: Epoch-based
84
+
85
+ ### Quantization
86
+ The model was quantized using **float16** for inference, reducing latency and memory usage while maintaining accuracy.
87
+
88
+ ## πŸ“‚ Repository Structure
89
+ ```bash
90
+ .
91
+ β”œβ”€β”€ model/ # Contains the fine-tuned model files
92
+ β”œβ”€β”€ tokenizer_config/ # Tokenizer configuration and vocabulary files
93
+ β”œβ”€β”€ model.safetensors/ # Quantized Model
94
+ β”œβ”€β”€ README.md # Model documentation
95
+ ```
96
+
97
+ ## ⚠️ Limitations
98
+ - The model may struggle with **sarcasm and nuanced sentiments**.
99
+ - Performance may vary across **different writing styles** and **review lengths**.
100
+ - **Quantization** may slightly affect accuracy compared to the full-precision model.
101
+
102
+ ## 🀝 Contributing
103
+ Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.
104
+
105
+ ---