File size: 1,697 Bytes

f6bb245
 
fef7614
 
 
 
a71c8f4
 
fef7614
 
 
 
 
 
 
 
a71c8f4
 
f6bb245
 
fef7614
f6bb245
fef7614
f6bb245
fef7614
 
 
f6bb245
a71c8f4
f6bb245
fef7614
f6bb245
fef7614
f6bb245
fef7614
f6bb245
fef7614
 
 
 
 
 
 
 
 
 
f6bb245
fef7614
 
f6bb245
fef7614
f6bb245
fef7614

---
library_name: transformers
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
- generated_from_trainer
- llm-router
- modernbert
metrics:
- f1
model-index:
- name: ModernBERT-large-llm-router
  results: []
datasets:
- DevQuasar/llm_router_dataset-synth
pipeline_tag: text-classification
language:
- en
---

# ModernBERT-large-llm-router

This model is a fine-tuned version of the [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-base) model using the [DevQuasar/llm_router_dataset-synth](https://huggingface.co/datasets/DevQuasar/llm_router_dataset-synth) dataset.

The fine-tuned model achieves the following results on the test set:
- Loss: 0.0555
- F1: 0.9933

This model was trained using a RTX 4090

## Model description

See original [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) model card for additional information. This model is intended to classify queries for LLM routing. where advanced/complicated queries are labeled as 1 (large_llm) and simpler queries are labeled as 0 (small_llm).

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 16
- gradient_accumulation_steps: 2
- bf16: True
- seed: 42
- optimizer: Use adamw_torch_fused
- lr_scheduler_type: linear
- num_epochs: 5

### Training Code
GITHUB URL TO BE ADDED

### Training results

| Epoch | Validation Loss | F1     |
|:-----:|:---------------:|:------:|
| 1.0   | 0.0296          | 0.9907 |
| 2.0   | 0.0327          | 0.9911 |
| 3.0   | 0.0474          | 0.9933 |
| 4.0   | 0.0563          | 0.9933 |
| 5.0   | 0.0554          | 0.9933 |