README.md · 2024-mcm-everitt-ryan/README at 2c7a4993a11d83b6438a7d28c5bccb8b08b8d76f

metadata

title: Large Language Models for Detecting Bias in Job Descriptions.
emoji: 🌍
colorFrom: gray
colorTo: pink
sdk: static
pinned: false
license: apache-2.0
short_description: LLms for Detecting Bias in Job Descriptions.

Abstract—This study explores the application of large language (LLM) models for detecting implicit bias in job descriptions, an important concern in human resources that shapes applicant pools and influences employer perception. We compare different LLM architectures—encoder, encoder-decoder, and decoder mod- els—focusing on seven specific bias types. The research questions address the capability of foundation LLMs to detect implicit bias and the effectiveness of domain adaptation via fine-tuning versus prompt tuning. Results indicate that fine-tuned models outperform non-fine-tuned models in detecting biases, with Flan- T5-XL emerging as the top performer, surpassing the zero-shot prompting of GPT-4o model. A labelled dataset comprising gold, silver, and bronze-standard data was created for this purpose and open-sourced1 to advance the field and serve as a valuable resource for future studies.

Introduction—In human resources, bias affects both employers and em- ployees in explicit and implicit forms. Explicit bias is conscious and controllable, but can be illegal in employment contexts. Implicit bias is subtle, unconscious, and harder to address. Implicit bias in job descriptions is a major concern as it shapes the applicant pool and influences appli- cants’ decisions. Bias in the language of job descriptions can affect how attractive a role appears to different individuals and can impact employer perception. The challenge is to efficiently identify and mitigate these biases.

The application of large language models (LLMs) for de- tecting bias in job descriptions is promising but underexplored. This study examines the effectiveness of various LLM archi- tectures (encoder, encoder-decoder, decoder) less than 10 billion parameters in detecting implicit bias.

We conceptualise the task of identifying implicit bias in job descriptions as a multi-label classification problem, where each job description is assigned a subset of labels from a set of eight categories—age, disability, feminine, masculine, general exclusionary, racial, sexuality, and neutral. This study investigates two primary research questions:

Can foundation LLMs accurately detect implicit bias in job descriptions without specific task training? We evalu- ate the performance of three topical decoder-only models under four distinct prompt settings, assessing their ability to extract relevant information from job descriptions and identify implicit bias.
Does domain adaptation via fine-tuning foundational LLMs outperform prompt tuning for detecting implicit bias in job descriptions? We fine-tune models with vary- ing architectures as text-classifiers on task-specific data and compare their performance to that of prompt-tuned models.

Dataset—

Results—