vladinc's picture
Update README.md
d194105 verified
---
library_name: transformers
tags:
- big-five
- regression
- psychology
- transformer
- text-analysis
license: mit
datasets:
- jingjietan/essays-big5
language:
- en
---
# 🧠 Big Five Personality Regression Model
This model predicts Big Five personality traits — Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism — from English free-text inputs. The output is a set of five continuous values between 0.0 and 1.0, corresponding to each trait.
---
## Model Details
### Model Description
- **Developed by:** [vladinc](https://huggingface.co/vladinc)
- **Model type:** `distilbert-base-uncased`, fine-tuned
- **Language(s):** English
- **License:** MIT
- **Finetuned from model:** `distilbert-base-uncased`
- **Trained on:** ~8,700 essays from the `jingjietan/essays-big5` dataset
### Model Sources
- **Repository:** [https://huggingface.co/vladinc/bigfive-regression-model](https://huggingface.co/vladinc/bigfive-regression-model)
---
## Uses
### Direct Use
This model can be used to estimate personality profiles from user-written text. It may be useful in psychological analysis, conversational profiling, or educational feedback systems.
### Out-of-Scope Use
- Not intended for clinical or diagnostic use.
- Should not be used to make hiring, legal, or psychological decisions.
- Not validated across cultures or demographic groups.
---
## Bias, Risks, and Limitations
- Trained on essay data; generalizability to tweets, messages, or other short-form texts may be limited.
- Traits like Extraversion and Neuroticism had higher validation MSE, suggesting reduced predictive reliability.
- Cultural and linguistic biases in training data may influence predictions.
### Recommendations
Do not use predictions from this model in isolation. Supplement with human judgment and/or other assessment tools.
---
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("vladinc/bigfive-regression-model")
tokenizer = AutoTokenizer.from_pretrained("vladinc/bigfive-regression-model")
text = "I enjoy reflecting on abstract concepts and trying new things."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits) # 5 float scores between 0.0 and 1.0
Training Details
Training Data
Dataset: jingjietan/essays-big5
Format: Essay text + 5 numeric labels for personality traits
Training Procedure
Epochs: 3
Batch size: 8
Learning rate: 2e-5
Loss Function: Mean Squared Error
Metric for Best Model: MSE on Openness
Evaluation
Metrics
Trait Validation MSE
Openness 0.324
Conscientiousness 0.537
Extraversion 0.680
Agreeableness 0.441
Neuroticism 0.564
Citation
If you use this model, please cite it:
BibTeX:
bibtex
Copy
Edit
@misc{vladinc2025bigfive,
title={Big Five Personality Regression Model},
author={vladinc},
year={2025},
howpublished={\\url{https://huggingface.co/vladinc/bigfive-regression-model}}
}
Contact
If you have questions or suggestions, feel free to reach out via the Hugging Face profile.