sebastiansarasti
/

fakeJobs

Text Classification

Model card Files Files and versions Community

fakeJobs / README.md

sebastiansarasti's picture

sebastiansarasti

adding the notebook to train

110431b verified 4 days ago

|

history blame contribute delete

1.09 kB

	---
	license: apache-2.0
	language:
	- en
	metrics:
	- precision
	base_model:
	- distilbert/distilbert-base-uncased
	pipeline_tag: text-classification
	tags:
	- pytorch
	---

	# Fake Job Predictor

	## Data
	1. Data trained comes from this Kaggle repository: https://www.kaggle.com/datasets/shivamb/real-or-fake-fake-jobposting-prediction
	2. Original data size is around 18k samples. To avoid the class imbalacing problem, it was undersampled the majority class (true jobs).
	3. Final dataset used to train has a size of 4k sample.

	## Model
	1. Multi-head neural network. One head is used for each feature (description, requirements, and benefits of the job).
	2. Best metrics achieved (over validation data-split): Precision: 0.83, Recall: 0.65, F1-score: 0.71
	3. Code used for training comes from this GitHub repo: https://github.com/sebassaras02/AdvancedDLCourse/blob/master/02_transformers_nlp/bert.ipynb

	### Components:
	Text Encoder: distilbert-base-uncased is used to encode the textual input into a dense vector.

	## Future work:
	Train over larger datasets and with more computer resources