[2024-11-21 05:01:02] INFO - super_gradients.common.crash_handler.crash_tips_setup - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it [2024-11-21 05:01:03] DEBUG - matplotlib - matplotlib data path: /opt/conda/envs/app/lib/python3.10/site-packages/matplotlib/mpl-data [2024-11-21 05:01:03] DEBUG - matplotlib - CONFIGDIR=/root/.config/matplotlib [2024-11-21 05:01:03] DEBUG - matplotlib - interactive is False [2024-11-21 05:01:03] DEBUG - matplotlib - platform is linux [2024-11-21 05:01:03] DEBUG - matplotlib - CACHEDIR=/root/.cache/matplotlib [2024-11-21 05:01:03] DEBUG - matplotlib.font_manager - Using fontManager instance from /root/.cache/matplotlib/fontlist-v390.json [2024-11-21 05:01:03] DEBUG - super_gradients.common.sg_loggers.clearml_sg_logger - Failed to import clearml [2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=UNKNOWN_NAME [2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=app [2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=app [2024-11-21 05:01:04] INFO - super_gradients.sanity_check.env_sanity_check - Library check is not supported when super_gradients installed through "git+https://github.com/..." command [2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=train_from_recipe [2024-11-21 05:01:06] INFO - super_gradients.training.sg_trainer.sg_trainer - Using EMA with params {'decay': 0.9, 'decay_type': 'threshold', 'beta': 15} [2024-11-21 05:01:08] INFO - super_gradients.training.utils.sg_trainer_utils - TRAINING PARAMETERS: - Mode: OFF - Number of GPUs: 1 (1 available on the machine) - Full dataset size: 2399 (len(train_set)) - Batch size per GPU: 12 (batch_size) - Batch Accumulate: 1 (batch_accumulate) - Total batch size: 12 (num_gpus * batch_size) - Effective Batch size: 12 (num_gpus * batch_size * batch_accumulate) - Iterations per epoch: 200 (len(train_loader)) - Gradient updates per epoch: 200 (len(train_loader) / batch_accumulate) - Model: YoloNAS_M (51.13M parameters, 51.13M optimized) - Learning Rates and Weight Decays: - default: (51.13M parameters). LR: 0.0004 (51.13M parameters) WD: 0.0, (72.22K parameters), WD: 0.0001, (51.06M parameters) [2024-11-21 05:01:08] INFO - super_gradients.training.sg_trainer.sg_trainer - Started training for 100 epochs (0/99) [2024-11-21 05:01:47] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth [2024-11-21 05:01:47] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation mAP@0.50: 0.0021471609361469746 [2024-11-21 05:02:25] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth [2024-11-21 05:02:25] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation mAP@0.50: 0.820346474647522 [2024-11-21 05:03:04] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth [2024-11-21 05:03:04] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation mAP@0.50: 0.8461349606513977 [2024-11-21 05:05:43] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth [2024-11-21 05:05:43] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation mAP@0.50: 0.8567641973495483 [2024-11-21 05:08:28] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth [2024-11-21 05:08:28] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation mAP@0.50: 0.8589617013931274 [2024-11-21 05:15:26] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth [2024-11-21 05:15:26] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation mAP@0.50: 0.8630316257476807 [2024-11-21 05:39:03] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth [2024-11-21 05:39:03] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation mAP@0.50: 0.8690835237503052 [2024-11-21 06:10:22] INFO - super_gradients.training.sg_trainer.sg_trainer - RUNNING ADDITIONAL TEST ON THE AVERAGED MODEL... [2024-11-21 06:10:27] INFO - super_gradients.common.sg_loggers.base_sg_logger - [CLEANUP] - Successfully stopped system monitoring process