--- library_name: transformers license: mit datasets: - ahmedheakl/resume-atlas language: - en metrics: - accuracy - f1 - recall - precision pipeline_tag: text-classification --- # How to use In this example, we do an inference on a sample from our dataset (_ResumeAtlas_). You can increase `max_length` for more accurate predictions. ```python !pip install datasets import numpy as np import torch from transformers import BertForSequenceClassification, BertTokenizer from datasets import load_dataset from sklearn import preprocessing dataset_id='ahmedheakl/resume-atlas' model_id='ahmedheakl/bert-resume-classification' label_column = "Category" num_labels=43 output_attentions=False output_hidden_states=False do_lower_case=True add_special_tokens=True max_length=512 pad_to_max_length=True return_attention_mask=True truncation=True ds = load_dataset(dataset_id, trust_remote_code=True) le = preprocessing.LabelEncoder() le.fit(ds['train'][label_column]) tokenizer = BertTokenizer.from_pretrained(model_id, do_lower_case=do_lower_case) model = BertForSequenceClassification.from_pretrained( model_id, num_labels = num_labels, output_attentions = output_attentions, output_hidden_states = output_hidden_states, ) model = model.to('cuda').eval() sent = ds['train'][0]['Text'] encoded_dict = tokenizer.encode_plus( sent, add_special_tokens=add_special_tokens, max_length=max_length, pad_to_max_length=pad_to_max_length, return_attention_mask=return_attention_mask, return_tensors='pt', truncation=truncation, ) input_ids = encoded_dict['input_ids'].to('cuda') attention_mask = encoded_dict['attention_mask'].to('cuda') outputs = model( input_ids, token_type_ids=None, attention_mask=attention_mask ) label_id = np.argmax(outputs['logits'].cpu().detach().tolist(), axis=1) print(f'Predicted: {le.inverse_transform(label_id)[0]} | Ground: {ds["train"][0][label_column]}') ``` # Model Card for Model ID **Please see paper & code for more information:** - https://github.com/noran-mohamed/Resume-Classification-Dataset - https://arxiv.org/abs/2406.18125 ## Citation **BibTeX:** ``` @article{heakl2024resumeatlas, title={ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models}, author={Heakl, Ahmed and Mohamed, Youssef and Mohamed, Noran and Sharkaway, Ali and Zaky, Ahmed}, journal={arXiv preprint arXiv:2406.18125}, year={2024} } ``` **APA:** ``` Heakl, A., Mohamed, Y., Mohamed, N., Sharkaway, A., & Zaky, A. (2024). ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets  and Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.18125 ``` ## Model Card Authors [optional] Email: ahmed.heakl@ejust.edu.eg Linkedin: https://linkedin.com/in/ahmed-heakl