mhoyl's picture
Update README.md
9615027 verified
---
language: en
license: mit
tags:
- text-classification
- text-regression
- readability
- education
- grade-level
- modernbert
library: transformers
widget:
- text: >-
The sun rises in the east and sets in the west. This is a simple fact that
most people learn as children.
example_title: Elementary Text
- text: >-
The quantum mechanical model of atomic structure provides a theoretical
framework for understanding the behavior of electrons in atoms.
example_title: High School Text
base_model: answerdotai/ModernBERT-base
pipeline_tag: text-classification
---
# Text Readability Grade Predictor
This model predicts the reading grade level of text using ModernBERT, trained on a dataset of texts with grade-level annotations. It can be used to estimate the educational reading level of various texts, from elementary school to college level.
## Model Details
- **Model Type:** ModernBERT fine-tuned for regression
- **Language:** English
- **Task:** Text Readability Assessment (Regression)
- **Framework:** PyTorch
- **Base Model:** `answerdotai/ModernBERT-base`
- **Training Data:** [CLEAR dataset](https://github.com/scrosseye/CLEAR-Corpus)
- **Performance:**
- RMSE: 1.4143198236928092
- R²: 0.8125544567620288
- **Output:** Predicted grade level (0-12)
## Usage
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("kiddom/modernbert-readability-grade-predictor")
tokenizer = AutoTokenizer.from_pretrained("kiddom/modernbert-readability-grade-predictor")
# Prepare text
text = "Your text goes here."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Run inference
with torch.no_grad():
outputs = model(**inputs)
# Get prediction (ensure it's between 0 and 12)
pred_grade = outputs.logits.item()
pred_grade = max(0, min(pred_grade, 12.0))
print(f"Predicted grade level: {pred_grade:.1f}")
```
## Reading Level Categories
The predicted grade levels correspond to these educational categories:
- **< 1.0:** Pre-Kindergarten
- **1.0 - 2.9:** Early Elementary
- **3.0 - 5.9:** Elementary
- **6.0 - 8.9:** Middle School
- **9.0 - 11.9:** High School
- **12.0+:** College Level
## Example Predictions
### Example: Early Elementary
```
The cat sat on the mat. It was happy. The sun was shining.
```
**Predicted Grade Level:** 1.2
### Example: Middle School
```
The water cycle is a continuous process that includes evaporation, condensation, and precipitation. ...
```
**Predicted Grade Level:** 8.9
### Example: High School
```
The quantum mechanical model of atomic structure provides a theoretical framework for understanding ...
```
**Predicted Grade Level:** 11.6
## Limitations
- The model is trained on English text only
- Performance may vary for specialized or technical content
- Very short texts (fewer than 10 words) may not yield accurate predictions
- The model is calibrated for US educational grade levels
## Training
This model was fine-tuned on a custom dataset created by augmenting texts from various grade levels. The training process involved:
1. Collecting texts with known Lexile measures and Flesch-Kincaid Grade Levels
2. Augmenting the dataset through text chunking
3. Averaging grade level metrics for a more reliable target
4. Fine-tuning ModernBERT with a regression head
5. Optimizing for minimum RMSE and maximum R²