|
[2024-11-21 05:01:02] INFO - super_gradients.common.crash_handler.crash_tips_setup - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it |
|
[2024-11-21 05:01:03] DEBUG - matplotlib - matplotlib data path: /opt/conda/envs/app/lib/python3.10/site-packages/matplotlib/mpl-data |
|
[2024-11-21 05:01:03] DEBUG - matplotlib - CONFIGDIR=/root/.config/matplotlib |
|
[2024-11-21 05:01:03] DEBUG - matplotlib - interactive is False |
|
[2024-11-21 05:01:03] DEBUG - matplotlib - platform is linux |
|
[2024-11-21 05:01:03] DEBUG - matplotlib - CACHEDIR=/root/.cache/matplotlib |
|
[2024-11-21 05:01:03] DEBUG - matplotlib.font_manager - Using fontManager instance from /root/.cache/matplotlib/fontlist-v390.json |
|
[2024-11-21 05:01:03] DEBUG - super_gradients.common.sg_loggers.clearml_sg_logger - Failed to import clearml |
|
[2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=UNKNOWN_NAME |
|
[2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=app |
|
[2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=app |
|
[2024-11-21 05:01:04] INFO - super_gradients.sanity_check.env_sanity_check - Library check is not supported when super_gradients installed through "git+https://github.com/..." command |
|
[2024-11-21 05:01:04] DEBUG - hydra.core.utils - Setting JobRuntime:name=train_from_recipe |
|
[2024-11-21 05:01:06] INFO - super_gradients.training.sg_trainer.sg_trainer - Using EMA with params {'decay': 0.9, 'decay_type': 'threshold', 'beta': 15} |
|
[2024-11-21 05:01:08] INFO - super_gradients.training.utils.sg_trainer_utils - TRAINING PARAMETERS: |
|
- Mode: OFF |
|
- Number of GPUs: 1 (1 available on the machine) |
|
- Full dataset size: 2399 (len(train_set)) |
|
- Batch size per GPU: 12 (batch_size) |
|
- Batch Accumulate: 1 (batch_accumulate) |
|
- Total batch size: 12 (num_gpus * batch_size) |
|
- Effective Batch size: 12 (num_gpus * batch_size * batch_accumulate) |
|
- Iterations per epoch: 200 (len(train_loader)) |
|
- Gradient updates per epoch: 200 (len(train_loader) / batch_accumulate) |
|
- Model: YoloNAS_M (51.13M parameters, 51.13M optimized) |
|
- Learning Rates and Weight Decays: |
|
- default: (51.13M parameters). LR: 0.0004 (51.13M parameters) WD: 0.0, (72.22K parameters), WD: 0.0001, (51.06M parameters) |
|
|
|
[2024-11-21 05:01:08] INFO - super_gradients.training.sg_trainer.sg_trainer - Started training for 100 epochs (0/99) |
|
|
|
[2024-11-21 05:01:47] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth |
|
[2024-11-21 05:01:47] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.0021471609361469746 |
|
[2024-11-21 05:02:25] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth |
|
[2024-11-21 05:02:25] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.820346474647522 |
|
[2024-11-21 05:03:04] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth |
|
[2024-11-21 05:03:04] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8461349606513977 |
|
[2024-11-21 05:05:43] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth |
|
[2024-11-21 05:05:43] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8567641973495483 |
|
[2024-11-21 05:08:28] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth |
|
[2024-11-21 05:08:28] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8589617013931274 |
|
[2024-11-21 05:15:26] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth |
|
[2024-11-21 05:15:26] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8630316257476807 |
|
[2024-11-21 05:39:03] INFO - super_gradients.common.sg_loggers.base_sg_logger - Checkpoint saved in /opt/conda/envs/app/lib/python3.10/checkpoints/yolo_nas_m_roboflow_final-final-c2j0n-mdjfm/3/RUN_20241121_050106_668266/ckpt_best.pth |
|
[2024-11-21 05:39:03] INFO - super_gradients.training.sg_trainer.sg_trainer - Best checkpoint overriden: validation [email protected]: 0.8690835237503052 |
|
[2024-11-21 06:10:22] INFO - super_gradients.training.sg_trainer.sg_trainer - RUNNING ADDITIONAL TEST ON THE AVERAGED MODEL... |
|
[2024-11-21 06:10:27] INFO - super_gradients.common.sg_loggers.base_sg_logger - [CLEANUP] - Successfully stopped system monitoring process |
|
|