sebastiansarasti
/

fakeJobs

Text Classification

Model card Files Files and versions Community

sebastiansarasti commited on Dec 24, 2024

Commit

097b4b3

·

verified ·

1 Parent(s): f123fd9

updating readme file

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -9,4 +9,24 @@ base_model:
 pipeline_tag: text-classification
 tags:
 - pytorch
----

 pipeline_tag: text-classification
 tags:
 - pytorch
+---
+# Fake Job Predictor
+## Data
+1. Data trained comes from this Kaggle repository: https://www.kaggle.com/datasets/shivamb/real-or-fake-fake-jobposting-prediction
+2. Original data size is around 18k samples. To avoid the class imbalacing problem, it was undersampled the majority class (true jobs).
+3. Final dataset used to train has a size of 4k sample.
+## Model
+1. Multi-head neural network. One head is used for each feature (description, requirements, and benefits of the job).
+2. Best metrics achieved:
+    - Precision: 0.83
+    - Recall: 0.65
+    - F1-score: 0.71
+    -
+### Components:
+Text Encoder: distilbert-base-uncased is used to encode the textual input into a dense vector.
+## Future work:
+Train over larger datasets and with more computer resources