File size: 1,087 Bytes
db1988b 097b4b3 a7336dc 110431b a7336dc 097b4b3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
---
license: apache-2.0
language:
- en
metrics:
- precision
base_model:
- distilbert/distilbert-base-uncased
pipeline_tag: text-classification
tags:
- pytorch
---
# Fake Job Predictor
## Data
1. Data trained comes from this Kaggle repository: https://www.kaggle.com/datasets/shivamb/real-or-fake-fake-jobposting-prediction
2. Original data size is around 18k samples. To avoid the class imbalacing problem, it was undersampled the majority class (true jobs).
3. Final dataset used to train has a size of 4k sample.
## Model
1. Multi-head neural network. One head is used for each feature (description, requirements, and benefits of the job).
2. Best metrics achieved (over validation data-split): Precision: 0.83, Recall: 0.65, F1-score: 0.71
3. Code used for training comes from this GitHub repo: https://github.com/sebassaras02/AdvancedDLCourse/blob/master/02_transformers_nlp/bert.ipynb
### Components:
Text Encoder: distilbert-base-uncased is used to encode the textual input into a dense vector.
## Future work:
Train over larger datasets and with more computer resources |