# 🧠 BERT-Spam-Job-Posting-Detection-Model A BERT-based binary classifier fine-tuned to detect whether a job posting is **fake** or **real**. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements. --- ## ✨ Model Highlights - 📌 Based on [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) - 🔍 Fine-tuned on a custom dataset of job postings labeled as fake or real - ⚡ Binary classification: Fake Job Posting vs Real Job Posting - 💾 Lightweight and optimized for CPU and GPU inference --- ## 🧠 Intended Uses - Automated detection of fraudulent job postings - Job board moderation and quality control - Enhancing recruitment platform security - Improving user trust in job marketplaces - Regulatory compliance monitoring for job ads --- ## 🚫 Limitations - Trained primarily on English-language job postings - May underperform on postings from less-represented industries or regions - Not optimized for job descriptions longer than 128 tokens - Not suitable for multilingual or multimedia job posting content --- ## 🏋️‍♂️ Training Details | Field | Value | | -------------- | ----------------------------- | | **Base Model** | `bert-base-uncased` | | **Dataset** | Custom labeled job postings | | **Framework** | PyTorch with Transformers | | **Epochs** | 3 | | **Batch Size** | 16 | | **Max Length** | 128 tokens | | **Optimizer** | AdamW | | **Loss** | CrossEntropyLoss | | **Device** | CUDA-enabled GPU | --- ## 📊 Evaluation Metrics | Metric | Score | | --------- | ------ | | Accuracy | 0.97 | | Precision | 0.81 | --- ## 🚀 Usage ```python from transformers import BertTokenizerFast, BertForSequenceClassification import torch model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model" tokenizer = BertTokenizerFast.from_pretrained(model_name) model = BertForSequenceClassification.from_pretrained(model_name) model.eval() device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) def predict_with_bert(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128) device = next(model.parameters()).device # Get model device (cpu or cuda) inputs = {k: v.to(device) for k, v in inputs.items()} with torch.no_grad(): logits = model(**inputs).logits predicted_class_id = logits.argmax().item() return "Fake Job" if predicted_class_id == 1 else "Real Job" # Example print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now.")) print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python.")) ``` ## 🗂 Repository Structure ``` . ├── model/ # Quantized model files ├── tokenizer_config/ # Tokenizer and vocab files ├── model.safensors/ # Fine-tuned model in safetensors format ├── README.md # Model card ``` --- ## 🤝 Contributing Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue.