Nesta, the UK's innovation agency, has been scraping online job adverts since 2021 and building algorithms to extract and structure information as part of the Open Jobs Observatory project.
Although we are unable to share the raw data openly, we aim to open source our models, algorithms and tools so that anyone can use them for their own research and analysis.
π About
This model is pre-trained from a distilbert-base-uncased
checkpoint on 100k sentences from scraped online job postings as part of the Open Jobs Observatory.
π¨οΈ Use
To use the model:
from transformers import pipeline
model = pipeline('fill-mask', model='ihk/ojobert', tokenizer='ihk/ojobert')
An example use is as follows:
text = "Would you like to join a major [MASK] company?"
results = model(text, top_k=3)
results
>> [{'score': 0.1886572688817978,
'token': 13859,
'token_str': 'pharmaceutical',
'sequence': 'would you like to join a major pharmaceutical company?'},
{'score': 0.07436735928058624,
'token': 5427,
'token_str': 'insurance',
'sequence': 'would you like to join a major insurance company?'},
{'score': 0.06400047987699509,
'token': 2810,
'token_str': 'construction',
'sequence': 'would you like to join a major construction company?'}]
βοΈ Training results
The fine-tuning metrics are as follows:
- eval_loss: 2.5871026515960693
- eval_runtime: 134.4452
- eval_samples_per_second: 14.281
- eval_steps_per_second: 0.223
- epoch: 3.0
- perplexity: 13.29
- Downloads last month
- 18
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ihk/ojobert
Base model
distilbert/distilbert-base-uncased