Model Card for Model ID

current batches:

  • nv3[v0] (1700) | nv4[v1-2k] (4000) | nv4[v1-210k] (b1-b3: 6000)

training samples (throw / keep):

  • 8929, 2784

Training

train command:

#!/bin/bash

# =================== BEGIN NOTES =======================

# nothing new new in this one. dig previous scripts for details

# =================== END NOTES ==========================

# Define variables
BASE_MODEL="google/siglip2-large-patch16-512"
DATASET="distill-lab/COMBINE_nai-distill_00-01_eagle.library"
TASK="classification"
NUM_EPOCHS=10


# Run training command
python -m trainlib.hf_trainer.cli \
  --model_name_or_path $BASE_MODEL \
  --dataset_name $DATASET \
  --output_dir distill-n4_00-01_combined_cls_v1b3_siglip2_focal-loss \
  --remove_unused_columns False \
  --label_column_name star \
  --task $TASK \
  --do_train \
  --do_eval \
  --eval_strategy steps \
  --eval_steps 100 \
  --learning_rate 5e-6 \
  --num_train_epochs $NUM_EPOCHS \
  --per_device_train_batch_size 22 \
  --per_device_eval_batch_size 22 \
  --logging_strategy steps \
  --logging_steps 2 \
  --save_total_limit 1 \
  --seed 1337 \
  --lr_scheduler_type cosine \
  --dataloader_num_workers 16 \
  --ignore_mismatched_sizes True \
  --fp16 True  # EXTRA ARGUMENT

Eval

eval results: (~1.5% higher accuracy than v1b2, by adding 2000 samples)

wandb: Run summary:
wandb:            eval/accuracy 0.7852
wandb:                  eval/f1 0.46247
wandb:                eval/loss 0.23888
wandb:           eval/precision 0.53352
wandb:              eval/recall 0.40812
wandb:             eval/roc_auc 0.78516
wandb:             eval/runtime 19.6053
wandb:  eval/samples_per_second 105.431
wandb:    eval/steps_per_second 0.612
wandb:               total_flos 1.744816776738767e+20
wandb:              train/epoch 10.0
wandb:        train/global_step 670
wandb:          train/grad_norm 279129.6875
wandb:      train/learning_rate 0.0
wandb:               train/loss 0.1785
wandb:               train_loss 0.21612
wandb:            train_runtime 1212.1372
wandb: train_samples_per_second 96.631
wandb:   train_steps_per_second 0.553
Downloads last month
21
Safetensors
Model size
317M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.