File size: 3,461 Bytes
3df6505
 
 
 
264bdf0
 
 
 
 
 
9615027
3df6505
264bdf0
 
 
 
 
 
 
 
9615027
1c22008
3df6505
 
 
 
 
 
 
 
 
 
 
 
 
ea07d3a
3df6505
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
264bdf0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
language: en
license: mit
tags:
- text-classification
- text-regression
- readability
- education
- grade-level
- modernbert
library: transformers
widget:
- text: >-
    The sun rises in the east and sets in the west. This is a simple fact that
    most people learn as children.
  example_title: Elementary Text
- text: >-
    The quantum mechanical model of atomic structure provides a theoretical
    framework for understanding the behavior of electrons in atoms.
  example_title: High School Text
base_model: answerdotai/ModernBERT-base
pipeline_tag: text-classification
---

# Text Readability Grade Predictor

This model predicts the reading grade level of text using ModernBERT, trained on a dataset of texts with grade-level annotations. It can be used to estimate the educational reading level of various texts, from elementary school to college level.

## Model Details

- **Model Type:** ModernBERT fine-tuned for regression
- **Language:** English
- **Task:** Text Readability Assessment (Regression)
- **Framework:** PyTorch
- **Base Model:** `answerdotai/ModernBERT-base`
- **Training Data:** [CLEAR dataset](https://github.com/scrosseye/CLEAR-Corpus)
- **Performance:**
  - RMSE: 1.4143198236928092
  - R²: 0.8125544567620288
- **Output:** Predicted grade level (0-12)

## Usage

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("kiddom/modernbert-readability-grade-predictor")
tokenizer = AutoTokenizer.from_pretrained("kiddom/modernbert-readability-grade-predictor")

# Prepare text
text = "Your text goes here."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Get prediction (ensure it's between 0 and 12)
pred_grade = outputs.logits.item()
pred_grade = max(0, min(pred_grade, 12.0))
print(f"Predicted grade level: {pred_grade:.1f}")
```

## Reading Level Categories

The predicted grade levels correspond to these educational categories:

- **< 1.0:** Pre-Kindergarten
- **1.0 - 2.9:** Early Elementary
- **3.0 - 5.9:** Elementary
- **6.0 - 8.9:** Middle School
- **9.0 - 11.9:** High School
- **12.0+:** College Level

## Example Predictions

### Example: Early Elementary
```
The cat sat on the mat. It was happy. The sun was shining.
```
**Predicted Grade Level:** 1.2

### Example: Middle School
```
The water cycle is a continuous process that includes evaporation, condensation, and precipitation. ...
```
**Predicted Grade Level:** 8.9

### Example: High School
```
The quantum mechanical model of atomic structure provides a theoretical framework for understanding ...
```
**Predicted Grade Level:** 11.6

## Limitations

- The model is trained on English text only
- Performance may vary for specialized or technical content
- Very short texts (fewer than 10 words) may not yield accurate predictions
- The model is calibrated for US educational grade levels

## Training

This model was fine-tuned on a custom dataset created by augmenting texts from various grade levels. The training process involved:

1. Collecting texts with known Lexile measures and Flesch-Kincaid Grade Levels
2. Augmenting the dataset through text chunking
3. Averaging grade level metrics for a more reliable target
4. Fine-tuning ModernBERT with a regression head
5. Optimizing for minimum RMSE and maximum R²