anthonyivn
/

ModernBERT-Base-llm-router

Text Classification

Generated from Trainer

Model card Files Files and versions Community

ModernBERT-Base-llm-router / README.md

anthonyivn's picture

Update README.md

a71c8f4 verified 5 months ago

|

history blame contribute delete

1.7 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	tags:
	- generated_from_trainer
	- llm-router
	- modernbert
	metrics:
	- f1
	model-index:
	- name: ModernBERT-large-llm-router
	results: []
	datasets:
	- DevQuasar/llm_router_dataset-synth
	pipeline_tag: text-classification
	language:
	- en
	---

	# ModernBERT-large-llm-router

	This model is a fine-tuned version of the [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-base) model using the [DevQuasar/llm_router_dataset-synth](https://huggingface.co/datasets/DevQuasar/llm_router_dataset-synth) dataset.

	The fine-tuned model achieves the following results on the test set:
	- Loss: 0.0555
	- F1: 0.9933

	This model was trained using a RTX 4090

	## Model description

	See original [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) model card for additional information. This model is intended to classify queries for LLM routing. where advanced/complicated queries are labeled as 1 (large_llm) and simpler queries are labeled as 0 (small_llm).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- eval_batch_size: 16
	- gradient_accumulation_steps: 2
	- bf16: True
	- seed: 42
	- optimizer: Use adamw_torch_fused
	- lr_scheduler_type: linear
	- num_epochs: 5

	### Training Code
	GITHUB URL TO BE ADDED

	### Training results

	\| Epoch \| Validation Loss \| F1 \|
	\|:-----:\|:---------------:\|:------:\|
	\| 1.0 \| 0.0296 \| 0.9907 \|
	\| 2.0 \| 0.0327 \| 0.9911 \|
	\| 3.0 \| 0.0474 \| 0.9933 \|
	\| 4.0 \| 0.0563 \| 0.9933 \|
	\| 5.0 \| 0.0554 \| 0.9933 \|