|
--- |
|
language: en |
|
license: mit |
|
tags: |
|
- text-classification |
|
- text-regression |
|
- readability |
|
- education |
|
- grade-level |
|
- modernbert |
|
library: transformers |
|
widget: |
|
- text: >- |
|
The sun rises in the east and sets in the west. This is a simple fact that |
|
most people learn as children. |
|
example_title: Elementary Text |
|
- text: >- |
|
The quantum mechanical model of atomic structure provides a theoretical |
|
framework for understanding the behavior of electrons in atoms. |
|
example_title: High School Text |
|
base_model: answerdotai/ModernBERT-base |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Text Readability Grade Predictor |
|
|
|
This model predicts the reading grade level of text using ModernBERT, trained on a dataset of texts with grade-level annotations. It can be used to estimate the educational reading level of various texts, from elementary school to college level. |
|
|
|
## Model Details |
|
|
|
- **Model Type:** ModernBERT fine-tuned for regression |
|
- **Language:** English |
|
- **Task:** Text Readability Assessment (Regression) |
|
- **Framework:** PyTorch |
|
- **Base Model:** `answerdotai/ModernBERT-base` |
|
- **Training Data:** [CLEAR dataset](https://github.com/scrosseye/CLEAR-Corpus) |
|
- **Performance:** |
|
- RMSE: 1.4143198236928092 |
|
- R²: 0.8125544567620288 |
|
- **Output:** Predicted grade level (0-12) |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
import torch |
|
|
|
# Load model and tokenizer |
|
model = AutoModelForSequenceClassification.from_pretrained("kiddom/modernbert-readability-grade-predictor") |
|
tokenizer = AutoTokenizer.from_pretrained("kiddom/modernbert-readability-grade-predictor") |
|
|
|
# Prepare text |
|
text = "Your text goes here." |
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
|
|
|
# Run inference |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
# Get prediction (ensure it's between 0 and 12) |
|
pred_grade = outputs.logits.item() |
|
pred_grade = max(0, min(pred_grade, 12.0)) |
|
print(f"Predicted grade level: {pred_grade:.1f}") |
|
``` |
|
|
|
## Reading Level Categories |
|
|
|
The predicted grade levels correspond to these educational categories: |
|
|
|
- **< 1.0:** Pre-Kindergarten |
|
- **1.0 - 2.9:** Early Elementary |
|
- **3.0 - 5.9:** Elementary |
|
- **6.0 - 8.9:** Middle School |
|
- **9.0 - 11.9:** High School |
|
- **12.0+:** College Level |
|
|
|
## Example Predictions |
|
|
|
### Example: Early Elementary |
|
``` |
|
The cat sat on the mat. It was happy. The sun was shining. |
|
``` |
|
**Predicted Grade Level:** 1.2 |
|
|
|
### Example: Middle School |
|
``` |
|
The water cycle is a continuous process that includes evaporation, condensation, and precipitation. ... |
|
``` |
|
**Predicted Grade Level:** 8.9 |
|
|
|
### Example: High School |
|
``` |
|
The quantum mechanical model of atomic structure provides a theoretical framework for understanding ... |
|
``` |
|
**Predicted Grade Level:** 11.6 |
|
|
|
## Limitations |
|
|
|
- The model is trained on English text only |
|
- Performance may vary for specialized or technical content |
|
- Very short texts (fewer than 10 words) may not yield accurate predictions |
|
- The model is calibrated for US educational grade levels |
|
|
|
## Training |
|
|
|
This model was fine-tuned on a custom dataset created by augmenting texts from various grade levels. The training process involved: |
|
|
|
1. Collecting texts with known Lexile measures and Flesch-Kincaid Grade Levels |
|
2. Augmenting the dataset through text chunking |
|
3. Averaging grade level metrics for a more reliable target |
|
4. Fine-tuning ModernBERT with a regression head |
|
5. Optimizing for minimum RMSE and maximum R² |