--- language: en license: mit tags: - text-classification - text-regression - readability - education - grade-level - modernbert library: transformers widget: - text: >- The sun rises in the east and sets in the west. This is a simple fact that most people learn as children. example_title: Elementary Text - text: >- The quantum mechanical model of atomic structure provides a theoretical framework for understanding the behavior of electrons in atoms. example_title: High School Text base_model: answerdotai/ModernBERT-base pipeline_tag: text-classification --- # Text Readability Grade Predictor This model predicts the reading grade level of text using ModernBERT, trained on a dataset of texts with grade-level annotations. It can be used to estimate the educational reading level of various texts, from elementary school to college level. ## Model Details - **Model Type:** ModernBERT fine-tuned for regression - **Language:** English - **Task:** Text Readability Assessment (Regression) - **Framework:** PyTorch - **Base Model:** `answerdotai/ModernBERT-base` - **Training Data:** [CLEAR dataset](https://github.com/scrosseye/CLEAR-Corpus) - **Performance:** - RMSE: 1.4143198236928092 - R²: 0.8125544567620288 - **Output:** Predicted grade level (0-12) ## Usage ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load model and tokenizer model = AutoModelForSequenceClassification.from_pretrained("kiddom/modernbert-readability-grade-predictor") tokenizer = AutoTokenizer.from_pretrained("kiddom/modernbert-readability-grade-predictor") # Prepare text text = "Your text goes here." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) # Run inference with torch.no_grad(): outputs = model(**inputs) # Get prediction (ensure it's between 0 and 12) pred_grade = outputs.logits.item() pred_grade = max(0, min(pred_grade, 12.0)) print(f"Predicted grade level: {pred_grade:.1f}") ``` ## Reading Level Categories The predicted grade levels correspond to these educational categories: - **< 1.0:** Pre-Kindergarten - **1.0 - 2.9:** Early Elementary - **3.0 - 5.9:** Elementary - **6.0 - 8.9:** Middle School - **9.0 - 11.9:** High School - **12.0+:** College Level ## Example Predictions ### Example: Early Elementary ``` The cat sat on the mat. It was happy. The sun was shining. ``` **Predicted Grade Level:** 1.2 ### Example: Middle School ``` The water cycle is a continuous process that includes evaporation, condensation, and precipitation. ... ``` **Predicted Grade Level:** 8.9 ### Example: High School ``` The quantum mechanical model of atomic structure provides a theoretical framework for understanding ... ``` **Predicted Grade Level:** 11.6 ## Limitations - The model is trained on English text only - Performance may vary for specialized or technical content - Very short texts (fewer than 10 words) may not yield accurate predictions - The model is calibrated for US educational grade levels ## Training This model was fine-tuned on a custom dataset created by augmenting texts from various grade levels. The training process involved: 1. Collecting texts with known Lexile measures and Flesch-Kincaid Grade Levels 2. Augmenting the dataset through text chunking 3. Averaging grade level metrics for a more reliable target 4. Fine-tuning ModernBERT with a regression head 5. Optimizing for minimum RMSE and maximum R²