Model Card for Model ID
current batches:
nv3[v0] (1700) | nv4[v1-2k] (4000) | nv4[v1-210k] (b1-b3: 6000)
training samples (throw / keep):
8929, 2784
Training
train command:
#!/bin/bash
# =================== BEGIN NOTES =======================
# nothing new new in this one. dig previous scripts for details
# =================== END NOTES ==========================
# Define variables
BASE_MODEL="google/siglip2-large-patch16-512"
DATASET="distill-lab/COMBINE_nai-distill_00-01_eagle.library"
TASK="classification"
NUM_EPOCHS=10
# Run training command
python -m trainlib.hf_trainer.cli \
--model_name_or_path $BASE_MODEL \
--dataset_name $DATASET \
--output_dir distill-n4_00-01_combined_cls_v1b3_siglip2_focal-loss \
--remove_unused_columns False \
--label_column_name star \
--task $TASK \
--do_train \
--do_eval \
--eval_strategy steps \
--eval_steps 100 \
--learning_rate 5e-6 \
--num_train_epochs $NUM_EPOCHS \
--per_device_train_batch_size 22 \
--per_device_eval_batch_size 22 \
--logging_strategy steps \
--logging_steps 2 \
--save_total_limit 1 \
--seed 1337 \
--lr_scheduler_type cosine \
--dataloader_num_workers 16 \
--ignore_mismatched_sizes True \
--fp16 True # EXTRA ARGUMENT
Eval
eval results: (~1.5% higher accuracy than v1b2, by adding 2000 samples)
wandb: Run summary:
wandb: eval/accuracy 0.7852
wandb: eval/f1 0.46247
wandb: eval/loss 0.23888
wandb: eval/precision 0.53352
wandb: eval/recall 0.40812
wandb: eval/roc_auc 0.78516
wandb: eval/runtime 19.6053
wandb: eval/samples_per_second 105.431
wandb: eval/steps_per_second 0.612
wandb: total_flos 1.744816776738767e+20
wandb: train/epoch 10.0
wandb: train/global_step 670
wandb: train/grad_norm 279129.6875
wandb: train/learning_rate 0.0
wandb: train/loss 0.1785
wandb: train_loss 0.21612
wandb: train_runtime 1212.1372
wandb: train_samples_per_second 96.631
wandb: train_steps_per_second 0.553
- Downloads last month
- 21
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.