image-model / logs_Nov21_05_01_06.txt
anthonymeo's picture
Upload folder using huggingface_hub
c6b353a verified
[2024-11-21 05:01:02] INFO - super_gradients.common.crash_handler.crash_tips_setup - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it
[2024-11-21 05:01:03] DEBUG - matplotlib - matplotlib data path: /opt/conda/envs/app/lib/python3.10/site-packages/matplotlib/mpl-data
[2024-11-21 05:01:03] DEBUG - matplotlib - CONFIGDIR=/root/.config/matplotlib
[2024-11-21 05:01:03] DEBUG - matplotlib - interactive is False
[2024-11-21 05:01:03] DEBUG - matplotlib - platform is linux
[2024-11-21 05:01:03] DEBUG - matplotlib - CACHEDIR=/root/.cache/matplotlib
[2024-11-21 05:01:03] DEBUG - matplotlib.font_manager - Using fontManager instance from /root/.cache/matplotlib/fontlist-v390.json
[2024-11-21 05:01:03] DEBUG - super_gradients.common.sg_loggers.clearml_sg_logger - Failed to import clearml
[2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=UNKNOWN_NAME
[2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=app
[2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=app
[2024-11-21 05:01:04] INFO - super_gradients.sanity_check.env_sanity_check - Library check is not supported when super_gradients installed through "git+https://github.com/..." command
[2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=train_from_recipe
[2024-11-21 05:01:06] INFO - super_gradients.training.sg_trainer.sg_trainer - Using EMA with params {'decay': 0.9, 'decay_type': 'threshold', 'beta': 15}
[2024-11-21 05:01:08] INFO - super_gradients.training.utils.sg_trainer_utils - TRAINING PARAMETERS:
- Mode: OFF
- Number of GPUs: 1 (1 available on the machine)
- Full dataset size: 2399 (len(train_set))
- Batch size per GPU: 12 (batch_size)
- Batch Accumulate: 1 (batch_accumulate)
- Total batch size: 12 (num_gpus * batch_size)
- Effective Batch size: 12 (num_gpus * batch_size * batch_accumulate)
- Iterations per epoch: 200 (len(train_loader))
- Gradient updates per epoch: 200 (len(train_loader) / batch_accumulate)
- Model: YoloNAS_M (51.13M parameters, 51.13M optimized)
- Learning Rates and Weight Decays:
- default: (51.13M parameters). LR: 0.0004 (51.13M parameters) WD: 0.0, (72.22K parameters), WD: 0.0001, (51.06M parameters)
[2024-11-21 05:01:08] INFO - super_gradients.training.sg_trainer.sg_trainer - Started training for 100 epochs (0/99)
[2024-11-21 05:01:47] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth
[2024-11-21 05:01:47] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.0021471609361469746
[2024-11-21 05:02:25] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth
[2024-11-21 05:02:25] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.820346474647522
[2024-11-21 05:03:04] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth
[2024-11-21 05:03:04] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8461349606513977
[2024-11-21 05:05:43] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth
[2024-11-21 05:05:43] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8567641973495483
[2024-11-21 05:08:28] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth
[2024-11-21 05:08:28] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8589617013931274
[2024-11-21 05:15:26] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth
[2024-11-21 05:15:26] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8630316257476807
[2024-11-21 05:39:03] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth
[2024-11-21 05:39:03] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8690835237503052
[2024-11-21 06:10:22] INFO - super_gradients.training.sg_trainer.sg_trainer - RUNNING ADDITIONAL TEST ON THE AVERAGED MODEL...
[2024-11-21 06:10:27] INFO - super_gradients.common.sg_loggers.base_sg_logger - [CLEANUP] - Successfully stopped system monitoring process