Spaces:

henry000
/

YOLO

Running

App Files Files Community

henry000 commited on Jun 6, 2024

Commit

0d25eaa

2 Parent(s): 86ef0ef a7ef999

🔀 [Merge] branch 'SETUP' into MODELv2

Browse files

Files changed (19) hide show

.gitignore +1 -1
.pre-commit-config.yaml +13 -1
README.md +5 -5
docs/CONTRIBUTING.md +0 -2
docs/HOWTO.md +7 -7
pyproject.toml +1 -1
requirements.txt +1 -1
tests/test_model/test_yolo.py +1 -1
yolo/config/config.yaml +0 -1
yolo/config/dataset/coco.yaml +2 -2
yolo/config/dataset/dev.yaml +1 -1
yolo/config/general.yaml +1 -1
yolo/config/model/v7-base.yaml +107 -107
yolo/config/model/v9-c.yaml +10 -10
yolo/config/task/inference.yaml +1 -1
yolo/config/task/train.yaml +2 -2
yolo/config/task/validation.yaml +1 -1
yolo/tools/data_augmentation.py +6 -5
yolo/utils/logging_utils.py +3 -3

.gitignore CHANGED Viewed

@@ -145,4 +145,4 @@ runs
 node_modules/
 # Not ignore image for demo
-!demo/images/inference/*

 node_modules/
 # Not ignore image for demo
+!demo/images/inference/*

.pre-commit-config.yaml CHANGED Viewed

@@ -6,9 +6,21 @@ repos:
         language_version: python3  # Specify the Python version
         exclude: '.*\.yaml$'  # Regex pattern to exclude all YAML files
         args: ["--line-length", "120"]  # Set max line length to 100 characters
   - repo: https://github.com/pre-commit/mirrors-isort
     rev: v5.10.1  # Use the appropriate version or "stable" for the latest stable release
     hooks:
       - id: isort
         args: ["--profile", "black"]

         language_version: python3  # Specify the Python version
         exclude: '.*\.yaml$'  # Regex pattern to exclude all YAML files
         args: ["--line-length", "120"]  # Set max line length to 100 characters
   - repo: https://github.com/pre-commit/mirrors-isort
     rev: v5.10.1  # Use the appropriate version or "stable" for the latest stable release
     hooks:
       - id: isort
         args: ["--profile", "black"]
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v3.4.0
+    hooks:
+    - id: trailing-whitespace
+    - id: end-of-file-fixer
+    - id: check-yaml
+  - repo: https://github.com/kynan/nbstripout
+    rev: 0.5.0
+    hooks:
+    - id: nbstripout

README.md CHANGED Viewed

@@ -3,7 +3,7 @@
 ![WIP](https://img.shields.io/badge/status-WIP-orange)
 > [!IMPORTANT]
 > This project is currently a Work In Progress and may undergo significant changes. It is not recommended for use in production environments until further notice. Please check back regularly for updates.
->
 > Use of this code is at your own risk and discretion. It is advisable to consult with the project owner before deploying or integrating into any critical systems.
 Welcome to the official implementation of YOLOv7 and YOLOv9. This repository will contains the complete codebase, pre-trained models, and detailed instructions for training and deploying YOLOv9.
@@ -55,14 +55,14 @@ python lazy.py task=train task.batch_size=8 model=v9-c
 ### Transfer Learning
 To perform transfer learning with YOLOv9:
 ```shell
-python lazy.py task=train task.batch_size=8 model=v9-c task.dataset={dataset_config}
 ```
 ### Inference
 To evaluate the model performance, use:
 ```shell
 python lazy.py weights=v9-c.pt # if cloned from GitHub
-yolo task=inference task.source={Any} # if pip installed
 ```
 ### Validation [WIP]
@@ -80,11 +80,11 @@ Contributions to the YOLOv9 project are welcome! See [CONTRIBUTING](docs/CONTRIB
 ## Citations
 ```
 @misc{wang2024yolov9,
-      title={YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information},
       author={Chien-Yao Wang and I-Hau Yeh and Hong-Yuan Mark Liao},
       year={2024},
       eprint={2402.13616},
       archivePrefix={arXiv},
       primaryClass={cs.CV}
 }
-```

 ![WIP](https://img.shields.io/badge/status-WIP-orange)
 > [!IMPORTANT]
 > This project is currently a Work In Progress and may undergo significant changes. It is not recommended for use in production environments until further notice. Please check back regularly for updates.
+>
 > Use of this code is at your own risk and discretion. It is advisable to consult with the project owner before deploying or integrating into any critical systems.
 Welcome to the official implementation of YOLOv7 and YOLOv9. This repository will contains the complete codebase, pre-trained models, and detailed instructions for training and deploying YOLOv9.
 ### Transfer Learning
 To perform transfer learning with YOLOv9:
 ```shell
+python lazy.py task=train task.batch_size=8 model=v9-c task.data.dataset={dataset_config}
 ```
 ### Inference
 To evaluate the model performance, use:
 ```shell
 python lazy.py weights=v9-c.pt # if cloned from GitHub
+yolo task=inference task.data.source={Any} # if pip installed
 ```
 ### Validation [WIP]
 ## Citations
 ```
 @misc{wang2024yolov9,
+      title={YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information},
       author={Chien-Yao Wang and I-Hau Yeh and Hong-Yuan Mark Liao},
       year={2024},
       eprint={2402.13616},
       archivePrefix={arXiv},
       primaryClass={cs.CV}
 }
+```

docs/CONTRIBUTING.md CHANGED Viewed

@@ -40,5 +40,3 @@ Once you submit a PR, maintainers will review your work, suggest changes if nece
 Your contributions are greatly appreciated and vital to the project's success!
 Please feel free to contact [[email protected]](mailto:[email protected])!


40	Your contributions are greatly appreciated and vital to the project's success!
41
42	Please feel free to contact [[email protected]](mailto:[email protected])!

docs/HOWTO.md CHANGED Viewed

@@ -7,7 +7,7 @@ To facilitate easy customization of the YOLO model, we've structured the codebas
 You can change the model architecture simply by modifying the YAML configuration file. Here's how:
 1. **Modify Architecture in Config:**
    Navigate to your model's configuration file (typically formate like `yolo/config/model/v9-c.yaml`).
    - Adjust the architecture settings under the `architecture` section. Ensure that every module you reference exists in `module.py`, or refer to the next section on how to add new modules.
@@ -40,7 +40,7 @@ To add or modify a block in the model:
 1. **Create a New Module:**
    Define a new class in `module.py` that inherits from `nn.Module`.
    The constructor should accept `in_channels` as a parameter. Make sure to calculate `out_channels` based on your model's requirements or configure it through the YAML file using `args`.
     ```python
@@ -49,7 +49,7 @@ To add or modify a block in the model:
             super().__init__()
             self.module = # conv, bool, ...
         def forward(self, x):
-            return self.module(x)
     ```
 2. **Reference in Config:**
@@ -138,11 +138,11 @@ Custom transformations should be designed to accept an image and its bounding bo
         - `func` draw_bboxes: given a image and list of bbox, draw bbox on the image
         - `func` draw_model: visualize the given model
     - **get_dataset**
-        - `func` download_file: for a given link, downlaod the file
-        - `func` unzip_file: unzip the downlaoded zip to data/
         - `func` check_files: check if the dataset file numbers is correct
-        - `func` prepare_dataset: automatic downlaod the dataset and check if it is correct
     - **loss**
         - `class` BoxLoss: a Custom Loss for bounding box
         - `class` YOLOLoss: a implementation of yolov9 loss
-        - `class` DualLoss: a implementation of yolov9 loss with auxiliary detection head

 You can change the model architecture simply by modifying the YAML configuration file. Here's how:
 1. **Modify Architecture in Config:**
    Navigate to your model's configuration file (typically formate like `yolo/config/model/v9-c.yaml`).
    - Adjust the architecture settings under the `architecture` section. Ensure that every module you reference exists in `module.py`, or refer to the next section on how to add new modules.
 1. **Create a New Module:**
    Define a new class in `module.py` that inherits from `nn.Module`.
    The constructor should accept `in_channels` as a parameter. Make sure to calculate `out_channels` based on your model's requirements or configure it through the YAML file using `args`.
     ```python
             super().__init__()
             self.module = # conv, bool, ...
         def forward(self, x):
+            return self.module(x)
     ```
 2. **Reference in Config:**
         - `func` draw_bboxes: given a image and list of bbox, draw bbox on the image
         - `func` draw_model: visualize the given model
     - **get_dataset**
+        - `func` download_file: for a given link, download the file
+        - `func` unzip_file: unzip the downloaded zip to data/
         - `func` check_files: check if the dataset file numbers is correct
+        - `func` prepare_dataset: automatic download the dataset and check if it is correct
     - **loss**
         - `class` BoxLoss: a Custom Loss for bounding box
         - `class` YOLOLoss: a implementation of yolov9 loss
+        - `class` DualLoss: a implementation of yolov9 loss with auxiliary detection head

pyproject.toml CHANGED Viewed

@@ -31,4 +31,4 @@ requires = [
 ]
 [project.scripts]
-yolo = "yolo.lazy:main"

 ]
 [project.scripts]
+yolo = "yolo.lazy:main"

requirements.txt CHANGED Viewed

@@ -12,4 +12,4 @@ rich
 torch
 torchvision
 tqdm
-wandb

 torch
 torchvision
 tqdm
+wandb

tests/test_model/test_yolo.py CHANGED Viewed

@@ -20,7 +20,7 @@ def test_build_model():
         OmegaConf.set_struct(cfg.model, False)
         cfg.weight = None
-        model = YOLO(cfg.model, 80)
         assert len(model.model) == 38

         OmegaConf.set_struct(cfg.model, False)
         cfg.weight = None
+        model = YOLO(cfg.model)
         assert len(model.model) == 38

yolo/config/config.yaml CHANGED Viewed

@@ -10,4 +10,3 @@ defaults:
   - dataset: coco
   - model: v9-c
   - general

   - dataset: coco
   - model: v9-c
   - general

yolo/config/dataset/coco.yaml CHANGED Viewed

@@ -8,7 +8,7 @@ auto_download:
     train2017:
       file_name: train2017
       file_num: 118287
-    val2017:
       file_name: val2017
       file_num: 5000
     test2017:
@@ -17,4 +17,4 @@ auto_download:
   annotations:
     base_url: http://images.cocodataset.org/annotations/
     annotations:
-      file_name: annotations_trainval2017

     train2017:
       file_name: train2017
       file_num: 118287
+    val2017:
       file_name: val2017
       file_num: 5000
     test2017:
   annotations:
     base_url: http://images.cocodataset.org/annotations/
     annotations:
+      file_name: annotations_trainval2017

yolo/config/dataset/dev.yaml CHANGED Viewed

@@ -2,4 +2,4 @@ path: data/dev
 train: train
 validation: test
-auto_download:

 train: train
 validation: test
+auto_download:

yolo/config/general.yaml CHANGED Viewed

@@ -12,4 +12,4 @@ lucky_number: 10
 use_wandb: False
 use_TensorBoard: False
-weight: weights/v9-c.pt

 use_wandb: False
 use_TensorBoard: False
+weight: weights/v9-c.pt

yolo/config/model/v7-base.yaml CHANGED Viewed

@@ -4,243 +4,243 @@ anchor:
 model:
   backbone:
-  - Conv:
       args: {out_channels: 32, kernel_size: 3}
-  - Conv:
       args: {out_channels: 64, kernel_size: 3, stride: 2}
-  - Conv:
       args: {out_channels: 64, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3, stride: 2}
-  - Conv:
       args: {out_channels: 64, kernel_size: 1}
-  - Conv:
       args: {out_channels: 64, kernel_size: 1}
       source: -2
-  - Conv:
       args: {out_channels: 64, kernel_size: 3}
-  - Conv:
       args: {out_channels: 64, kernel_size: 3}
-  - Conv:
       args: {out_channels: 64, kernel_size: 3}
-  - Conv:
       args: {out_channels: 64, kernel_size: 3}
-  - Concat:
       source: [-1, -3, -5, -6]
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - Pool:
       args: {padding: 0}
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: -3
-  - Conv:
       args: {out_channels: 128, kernel_size: 3, stride: 2}
-  - Concat:
       source: [-1, -3]
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: -2
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Concat:
       source: [-1, -3, -5, -6]
       tags: 8x
-  - Conv:
       args: {out_channels: 512, kernel_size: 1}
-  - Pool:
       args: {padding: 0}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -3
-  - Conv:
       args: {out_channels: 256, kernel_size: 3, stride: 2}
-  - Concat:
       source: [-1, -3]
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -2
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Concat:
       source: [-1, -3, -5, -6]
-  - Conv:
       args: {out_channels: 1024, kernel_size: 1}
       tags: 16x
-  - Pool:
       args: {padding: 0}
-  - Conv:
       args: {out_channels: 512, kernel_size: 1}
-  - Conv:
       args: {out_channels: 512, kernel_size: 1}
       source: -3
-  - Conv:
       args: {out_channels: 512, kernel_size: 3, stride: 2}
-  - Concat:
       source: [-1, -3]
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -2
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Concat:
       source: [-1, -3, -5, -6]
-  - Conv:
       args: {out_channels: 1024, kernel_size: 1}
       tags: 32x
   head:
-  - SPPCSPConv:
       args: {out_channels: 512}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - UpSample:
       args: {scale_factor: 2}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: 16x
-  - Concat:
       source: [-1, -2]
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -2
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Concat:
       source: [-1, -2, -3, -4, -5, -6]
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
-  - UpSample:
       args: {scale_factor: 2}
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: 8x
-  - Concat:
       source: [-1, -2]
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: -2
-  - Conv:
       args: {out_channels: 64, kernel_size: 3}
-  - Conv:
       args: {out_channels: 64, kernel_size: 3}
-  - Conv:
       args: {out_channels: 64, kernel_size: 3}
-  - Conv:
       args: {out_channels: 64, kernel_size: 3}
-  - Concat:
       source: [-1, -2, -3, -4, -5, -6]
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
-  - Pool:
       args: {padding: 0}
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
-  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: -3
-  - Conv:
       args: {out_channels: 128, kernel_size: 3, stride: 2}
-  - Concat:
       source: [-1, -3, 63]
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -2
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Conv:
       args: {out_channels: 128, kernel_size: 3}
-  - Concat:
       source: [-1, -2, -3, -4, -5, -6]
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - Pool:
       args: {padding: 0}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
-  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -3
-  - Conv:
       args: {out_channels: 256, kernel_size: 3, stride: 2}
-  - Concat:
       source: [-1, -3, 51]
-  - Conv:
       args: {out_channels: 512, kernel_size: 1}
-  - Conv:
       args: {out_channels: 512, kernel_size: 1}
       source: -2
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Conv:
       args: {out_channels: 256, kernel_size: 3}
-  - Concat:
       source: [-1, -2, -3, -4, -5, -6]
-  - Conv:
       args: {out_channels: 512, kernel_size: 1}
-  - RepConv:
       args: {out_channels: 256}
       source: 75
-  - RepConv:
       args: {out_channels: 512}
       source: 88
-  - RepConv:
       args: {out_channels: 1024}
       source: 101
-  - IDetect:
       args:
         anchors:
             - [12,16, 19,36, 40,28]  # P3/8
             - [36,75, 76,55, 72,146]  # P4/16
             - [142,110, 192,243, 459,401]  # P5/32
       source: [102, 103, 104]
-      output: True

 model:
   backbone:
+  - Conv:
       args: {out_channels: 32, kernel_size: 3}
+  - Conv:
       args: {out_channels: 64, kernel_size: 3, stride: 2}
+  - Conv:
       args: {out_channels: 64, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3, stride: 2}
+  - Conv:
       args: {out_channels: 64, kernel_size: 1}
+  - Conv:
       args: {out_channels: 64, kernel_size: 1}
       source: -2
+  - Conv:
       args: {out_channels: 64, kernel_size: 3}
+  - Conv:
       args: {out_channels: 64, kernel_size: 3}
+  - Conv:
       args: {out_channels: 64, kernel_size: 3}
+  - Conv:
       args: {out_channels: 64, kernel_size: 3}
+  - Concat:
       source: [-1, -3, -5, -6]
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - Pool:
       args: {padding: 0}
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: -3
+  - Conv:
       args: {out_channels: 128, kernel_size: 3, stride: 2}
+  - Concat:
       source: [-1, -3]
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: -2
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Concat:
       source: [-1, -3, -5, -6]
       tags: 8x
+  - Conv:
       args: {out_channels: 512, kernel_size: 1}
+  - Pool:
       args: {padding: 0}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -3
+  - Conv:
       args: {out_channels: 256, kernel_size: 3, stride: 2}
+  - Concat:
       source: [-1, -3]
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -2
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Concat:
       source: [-1, -3, -5, -6]
+  - Conv:
       args: {out_channels: 1024, kernel_size: 1}
       tags: 16x
+  - Pool:
       args: {padding: 0}
+  - Conv:
       args: {out_channels: 512, kernel_size: 1}
+  - Conv:
       args: {out_channels: 512, kernel_size: 1}
       source: -3
+  - Conv:
       args: {out_channels: 512, kernel_size: 3, stride: 2}
+  - Concat:
       source: [-1, -3]
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -2
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Concat:
       source: [-1, -3, -5, -6]
+  - Conv:
       args: {out_channels: 1024, kernel_size: 1}
       tags: 32x
   head:
+  - SPPCSPConv:
       args: {out_channels: 512}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - UpSample:
       args: {scale_factor: 2}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: 16x
+  - Concat:
       source: [-1, -2]
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -2
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Concat:
       source: [-1, -2, -3, -4, -5, -6]
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
+  - UpSample:
       args: {scale_factor: 2}
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: 8x
+  - Concat:
       source: [-1, -2]
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: -2
+  - Conv:
       args: {out_channels: 64, kernel_size: 3}
+  - Conv:
       args: {out_channels: 64, kernel_size: 3}
+  - Conv:
       args: {out_channels: 64, kernel_size: 3}
+  - Conv:
       args: {out_channels: 64, kernel_size: 3}
+  - Concat:
       source: [-1, -2, -3, -4, -5, -6]
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
+  - Pool:
       args: {padding: 0}
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
+  - Conv:
       args: {out_channels: 128, kernel_size: 1}
       source: -3
+  - Conv:
       args: {out_channels: 128, kernel_size: 3, stride: 2}
+  - Concat:
       source: [-1, -3, 63]
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -2
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Conv:
       args: {out_channels: 128, kernel_size: 3}
+  - Concat:
       source: [-1, -2, -3, -4, -5, -6]
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - Pool:
       args: {padding: 0}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
+  - Conv:
       args: {out_channels: 256, kernel_size: 1}
       source: -3
+  - Conv:
       args: {out_channels: 256, kernel_size: 3, stride: 2}
+  - Concat:
       source: [-1, -3, 51]
+  - Conv:
       args: {out_channels: 512, kernel_size: 1}
+  - Conv:
       args: {out_channels: 512, kernel_size: 1}
       source: -2
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Conv:
       args: {out_channels: 256, kernel_size: 3}
+  - Concat:
       source: [-1, -2, -3, -4, -5, -6]
+  - Conv:
       args: {out_channels: 512, kernel_size: 1}
+  - RepConv:
       args: {out_channels: 256}
       source: 75
+  - RepConv:
       args: {out_channels: 512}
       source: 88
+  - RepConv:
       args: {out_channels: 1024}
       source: 101
+  - IDetect:
       args:
         anchors:
             - [12,16, 19,36, 40,28]  # P3/8
             - [36,75, 76,55, 72,146]  # P4/16
             - [142,110, 192,243, 459,401]  # P5/32
       source: [102, 103, 104]
+      output: True

yolo/config/model/v9-c.yaml CHANGED Viewed

@@ -13,7 +13,7 @@ model:
         args: {out_channels: 128, kernel_size: 3, stride: 2}
     - RepNCSPELAN:
         args: {out_channels: 256, part_channels: 128}
     - ADown:
         args: {out_channels: 256}
     - RepNCSPELAN:
@@ -25,13 +25,13 @@ model:
     - RepNCSPELAN:
         args: {out_channels: 512, part_channels: 512}
         tags: B4
     - ADown:
         args: {out_channels: 512}
     - RepNCSPELAN:
         args: {out_channels: 512, part_channels: 512}
         tags: B5
   neck:
     - SPPELAN:
         args: {out_channels: 512}
@@ -49,12 +49,12 @@ model:
         args: {scale_factor: 2, mode: nearest}
     - Concat:
         source: [-1, B3]
   head:
     - RepNCSPELAN:
         args: {out_channels: 256, part_channels: 256}
         tags: P3
     - ADown:
         args: {out_channels: 256}
     - Concat:
@@ -70,7 +70,7 @@ model:
     - RepNCSPELAN:
         args: {out_channels: 512, part_channels: 512}
         tags: P5
   auxiliary:
     - CBLinear:
         source: B3
@@ -84,7 +84,7 @@ model:
         source: B5
         args: {out_channels: [256, 512, 512]}
         tags: R5
     - Conv:
         args: {out_channels: 64, kernel_size: 3, stride: 2}
         source: 0
@@ -126,7 +126,7 @@ model:
     - Anchor2Box:
         source: aux_head
         output: True
-        args:
             reg_max: ${model.anchor.reg_max}
             strides: ${model.anchor.strides}
         tags: aux_bbox
@@ -138,7 +138,7 @@ model:
     - Anchor2Box:
         source: reg_head
         output: True
-        args:
             reg_max: ${model.anchor.reg_max}
             strides: ${model.anchor.strides}
-        tags: reg_bbox

         args: {out_channels: 128, kernel_size: 3, stride: 2}
     - RepNCSPELAN:
         args: {out_channels: 256, part_channels: 128}
     - ADown:
         args: {out_channels: 256}
     - RepNCSPELAN:
     - RepNCSPELAN:
         args: {out_channels: 512, part_channels: 512}
         tags: B4
     - ADown:
         args: {out_channels: 512}
     - RepNCSPELAN:
         args: {out_channels: 512, part_channels: 512}
         tags: B5
   neck:
     - SPPELAN:
         args: {out_channels: 512}
         args: {scale_factor: 2, mode: nearest}
     - Concat:
         source: [-1, B3]
   head:
     - RepNCSPELAN:
         args: {out_channels: 256, part_channels: 256}
         tags: P3
     - ADown:
         args: {out_channels: 256}
     - Concat:
     - RepNCSPELAN:
         args: {out_channels: 512, part_channels: 512}
         tags: P5
   auxiliary:
     - CBLinear:
         source: B3
         source: B5
         args: {out_channels: [256, 512, 512]}
         tags: R5
     - Conv:
         args: {out_channels: 64, kernel_size: 3, stride: 2}
         source: 0
     - Anchor2Box:
         source: aux_head
         output: True
+        args:
             reg_max: ${model.anchor.reg_max}
             strides: ${model.anchor.strides}
         tags: aux_bbox
     - Anchor2Box:
         source: reg_head
         output: True
+        args:
             reg_max: ${model.anchor.reg_max}
             strides: ${model.anchor.strides}
+        tags: reg_bbox

yolo/config/task/inference.yaml CHANGED Viewed

@@ -7,4 +7,4 @@ data:
   data_augment: {}
 nms:
   min_confidence: 0.5
-  min_iou: 0.5

   data_augment: {}
 nms:
   min_confidence: 0.5
+  min_iou: 0.5

yolo/config/task/train.yaml CHANGED Viewed

@@ -6,7 +6,7 @@ defaults:
 epoch: 500
 data:
-  batch_size: 16
   image_size: ${image_size}
   cpu_num: ${cpu_num}
   shuffle: True
@@ -26,7 +26,7 @@ optimizer:
 loss:
   objective:
     BCELoss: 0.5
-    BoxLoss: 7.5
     DFLoss: 1.5
   aux:
     0.25

 epoch: 500
 data:
+  batch_size: 16
   image_size: ${image_size}
   cpu_num: ${cpu_num}
   shuffle: True
 loss:
   objective:
     BCELoss: 0.5
+    BoxLoss: 7.5
     DFLoss: 1.5
   aux:
     0.25

yolo/config/task/validation.yaml CHANGED Viewed

@@ -9,4 +9,4 @@ data:
   data_augment: {}
 nms:
   min_confidence: 0.001
-  min_iou: 0.7

   data_augment: {}
 nms:
   min_confidence: 0.001
+  min_iou: 0.7

yolo/tools/data_augmentation.py CHANGED Viewed

@@ -7,8 +7,9 @@ from torchvision.transforms import functional as TF
 class AugmentationComposer:
     """Composes several transforms together."""
-    def __init__(self, transforms, image_size: int = 640):
         self.transforms = transforms
         self.image_size = image_size[0]
         self.pad_resize = PadAndResize(self.image_size)
@@ -38,10 +39,10 @@ class PadAndResize:
         resized_img = square_img.resize((self.image_size, self.image_size))
-        boxes[:, 1] = (boxes[:, 1] + left) * scale
-        boxes[:, 2] = (boxes[:, 2] + top) * scale
-        boxes[:, 3] = (boxes[:, 3] + left) * scale
-        boxes[:, 4] = (boxes[:, 4] + top) * scale
         return resized_img, boxes

 class AugmentationComposer:
     """Composes several transforms together."""
+    def __init__(self, transforms, image_size: int = [640, 640]):
         self.transforms = transforms
+        # TODO: handle List of image_size [640, 640]
         self.image_size = image_size[0]
         self.pad_resize = PadAndResize(self.image_size)
         resized_img = square_img.resize((self.image_size, self.image_size))
+        boxes[:, 1] = (boxes[:, 1] * image.width + left) / self.image_size * scale
+        boxes[:, 2] = (boxes[:, 2] * image.height + top) / self.image_size * scale
+        boxes[:, 3] = (boxes[:, 3] * image.width + left) / self.image_size * scale
+        boxes[:, 4] = (boxes[:, 4] * image.height + top) / self.image_size * scale
         return resized_img, boxes

yolo/utils/logging_utils.py CHANGED Viewed

@@ -1,9 +1,9 @@
 """
-Module for initializing logging tools used in machine learning and data processing.
-Supports integration with Weights & Biases (wandb), Loguru, TensorBoard, and other
 logging frameworks as needed.
-This setup ensures consistent logging across various platforms, facilitating
 effective monitoring and debugging.
 Example:

 """
+Module for initializing logging tools used in machine learning and data processing.
+Supports integration with Weights & Biases (wandb), Loguru, TensorBoard, and other
 logging frameworks as needed.
+This setup ensures consistent logging across various platforms, facilitating
 effective monitoring and debugging.
 Example: