--- library_name: transformers tags: [] --- # Model Card for Model ID current batches: `nv3[v0] (1700) | nv4[v1-2k] (4000) | nv4[v1-210k] (b1b2: 4000)` same as https://huggingface.co/distill-lab/distill-n4_00-01_combined_cls_v1b2 but instead of 20 tried 100e. metrics: ``` 8168 ***** train metrics ***** 8169 epoch = 100.0 8170 total_flos = 334833095087GF 8171 train_loss = 0.0776 8172 train_runtime = 4:53:00.40 8173 train_samples_per_second = 56.955 8174 train_steps_per_second = 0.893 8176 ***** eval metrics ***** 8177 epoch = 100.0 8178 eval_accuracy = 0.7487 8179 eval_loss = 1.9947 8180 eval_runtime = 0:00:12.56 8181 eval_samples_per_second = 140.622 8182 eval_steps_per_second = 2.945 ``` ## Model details: (no significant accuracy jump; just to see what happens) ```bash BASE_MODEL = "facebook/dinov2-with-registers-large" DATASET = "distill-lab/COMBINE_nai-distill_00-01_eagle.library" TASK = "classification" # using single card to train it, so had to do higher batch size cmd = f"""python -m trainlib.hf_trainer.cli \ --model_name_or_path {BASE_MODEL} \ --dataset_name {DATASET} \ --output_dir distill-n4_00-01_combined_cls_v1b2-100e \ --remove_unused_columns False \ --label_column_name star \ --task {TASK} \ --do_train \ --do_eval \ --eval_strategy steps \ --eval_steps 100 \ --learning_rate 1e-5 \ --num_train_epochs 100 \ --per_device_train_batch_size 64 \ --per_device_eval_batch_size 48 \ --logging_strategy steps \ --logging_steps 2 \ --save_total_limit 1 \ --seed 1337 \ --lr_scheduler_type cosine \ --dataloader_num_workers 16 \ --ignore_mismatched_sizes True """ rest = f""" --push_to_hub: True \ --push_to_hub_organization distill-lab \ --hub_model_id nai-distill_00-01_combined_eagle_{TASK} \ --hub_strategy "end""" print(cmd) !{cmd} ```