kiddom
/

modernbert-readability-grade-predictor

Text Classification

text-regression

Model card Files Files and versions Community

modernbert-readability-grade-predictor / README.md

mhoyl's picture

Update README.md

9615027 verified 4 months ago

|

history blame contribute delete

3.46 kB

	---
	language: en
	license: mit
	tags:
	- text-classification
	- text-regression
	- readability
	- education
	- grade-level
	- modernbert
	library: transformers
	widget:
	- text: >-
	The sun rises in the east and sets in the west. This is a simple fact that
	most people learn as children.
	example_title: Elementary Text
	- text: >-
	The quantum mechanical model of atomic structure provides a theoretical
	framework for understanding the behavior of electrons in atoms.
	example_title: High School Text
	base_model: answerdotai/ModernBERT-base
	pipeline_tag: text-classification
	---

	# Text Readability Grade Predictor

	This model predicts the reading grade level of text using ModernBERT, trained on a dataset of texts with grade-level annotations. It can be used to estimate the educational reading level of various texts, from elementary school to college level.

	## Model Details

	- Model Type: ModernBERT fine-tuned for regression
	- Language: English
	- Task: Text Readability Assessment (Regression)
	- Framework: PyTorch
	- Base Model: `answerdotai/ModernBERT-base`
	- Training Data: [CLEAR dataset](https://github.com/scrosseye/CLEAR-Corpus)
	- Performance:
	- RMSE: 1.4143198236928092
	- R²: 0.8125544567620288
	- Output: Predicted grade level (0-12)

	## Usage

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	# Load model and tokenizer
	model = AutoModelForSequenceClassification.from_pretrained("kiddom/modernbert-readability-grade-predictor")
	tokenizer = AutoTokenizer.from_pretrained("kiddom/modernbert-readability-grade-predictor")

	# Prepare text
	text = "Your text goes here."
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

	# Run inference
	with torch.no_grad():
	outputs = model(**inputs)

	# Get prediction (ensure it's between 0 and 12)
	pred_grade = outputs.logits.item()
	pred_grade = max(0, min(pred_grade, 12.0))
	print(f"Predicted grade level: {pred_grade:.1f}")
	```

	## Reading Level Categories

	The predicted grade levels correspond to these educational categories:

	- < 1.0: Pre-Kindergarten
	- 1.0 - 2.9: Early Elementary
	- 3.0 - 5.9: Elementary
	- 6.0 - 8.9: Middle School
	- 9.0 - 11.9: High School
	- 12.0+: College Level

	## Example Predictions

	### Example: Early Elementary
	```
	The cat sat on the mat. It was happy. The sun was shining.
	```
	Predicted Grade Level: 1.2

	### Example: Middle School
	```
	The water cycle is a continuous process that includes evaporation, condensation, and precipitation. ...
	```
	Predicted Grade Level: 8.9

	### Example: High School
	```
	The quantum mechanical model of atomic structure provides a theoretical framework for understanding ...
	```
	Predicted Grade Level: 11.6

	## Limitations

	- The model is trained on English text only
	- Performance may vary for specialized or technical content
	- Very short texts (fewer than 10 words) may not yield accurate predictions
	- The model is calibrated for US educational grade levels

	## Training

	This model was fine-tuned on a custom dataset created by augmenting texts from various grade levels. The training process involved:

	1. Collecting texts with known Lexile measures and Flesch-Kincaid Grade Levels
	2. Augmenting the dataset through text chunking
	3. Averaging grade level metrics for a more reliable target
	4. Fine-tuning ModernBERT with a regression head
	5. Optimizing for minimum RMSE and maximum R²