# πŸ“„ Contract Sentiment Classifier (BERT) A fine-tuned BERT model for contract sentiment analysis, classifying legal or contractual text into positive, negative, or neutral sentiments. ## 🧠 Model Details - πŸ“Œ**Base Model**: bert-base-uncased - πŸ”§**Task**: Sentiment Classification (Contractual Text) - πŸ” **Labels**: `Negative (0)`, `Neutral (1)`, `Positive (2)` - πŸ’Ύ **Quantized version available**: for faster inference - 🧠 **Framework**: PyTorch, Transformers (πŸ€— Hugging Face) ## 🧠 Intended Uses - βœ… Classifying product feedback and user reviews - βœ… Sentiment analysis for e-commerce platforms - βœ… Social media monitoring and customer opinion mining --- ## 🚫 Limitations - ❌ Designed for English texts only - ❌Needs further tuning and evaluation on larger, diverse contract. - ❌ Not suitable for production use without robustness checks. --- ## πŸ‹οΈβ€β™‚οΈ Training Details - **Base Model**: `bert-base-uncased` - **Dataset**: Custom labeled Contract Sentiment dataset - **Epochs**: 3 - **Batch Size**: 5 - **Learning rate**: AdamW - **Hardware**: Trained on NVIDIA GPU (CUDA-enabled) --- ## πŸ“Š Evaluation Metrics | Metric | Score | |------------|-------| | Accuracy | 0.98 | | F1 | 0.99 | | Precision | 0.99 | | Recall | 0.97 | --- ## πŸ”Ž Label Mapping | Label ID | Sentiment | |----------|-----------| | 0 | Negative | | 1 | Neutral | | 2 | Positive | --- ## πŸš€ Usage Example ```python import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from sklearn.metrics import accuracy_score, precision_recall_fscore_support import torch from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments from datasets import Dataset import torch.nn.functional as F # Load model and tokenizer model_name = "AventIQ-AI/Sentiment-Analysis-for-Contract-Sentiment" tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3) model.eval() def tokenize_function(examples): return tokenizer(examples['text'], padding='max_length', truncation=True) # Inference def predict_sentiment(user_text): # Ensure input is a list for batch processing if isinstance(user_text, str): user_text = [user_text] # Tokenize input text inputs = tokenizer(user_text, return_tensors="pt", padding=True, truncation=True) # Predict using the model with torch.no_grad(): outputs = model(**inputs) preds = torch.argmax(outputs.logits, dim=1) # Decode predictions back to original sentiment labels decoded_preds = label_encoder.inverse_transform(preds.numpy()) # Print each prediction for text, sentiment in zip(user_text, decoded_preds): print(f"Text: '{text}' => Sentiment: {sentiment}") # Example predict_sentiment("The delivery scheduled") ``` --- ## πŸ§ͺ Quantization - Applied **post-training dynamic quantization** using PyTorch to reduce model size and speed up inference. - Quantized model supports CPU-based deployments. --- ## πŸ“ Repository Structure ``` . β”œβ”€β”€ model/ # Quantized model files β”œβ”€β”€ tokenizer/ # Tokenizer config and vocabulary β”œβ”€β”€ model.safetensors/ # Fine-tuned full-precision model β”œβ”€β”€ README.md # Model documentation ``` --- ## 🀝 Contributing We welcome contributions! Please feel free to raise an issue or submit a pull request if you find a bug or have a suggestion.