ndkhanh95
/

ViTPose_trainable

Model card Files Files and versions Community

ndkhanh95 commited on Dec 25, 2024

Commit

29d411b

verified ·

1 Parent(s): 17bce06

Upload 226 files

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

ViTPose/ckpts/vitpose-s-coco_25.pth +3 -0
ViTPose/easy_ViTPose/.dockerignore +2 -0
ViTPose/easy_ViTPose/.gitignore +13 -0
ViTPose/easy_ViTPose/.ipynb_checkpoints/README-checkpoint.md +275 -0
ViTPose/easy_ViTPose/.ipynb_checkpoints/colab_demo-checkpoint.ipynb +0 -0
ViTPose/easy_ViTPose/.ipynb_checkpoints/evaluation_on_coco-checkpoint.py +92 -0
ViTPose/easy_ViTPose/.ipynb_checkpoints/inference-checkpoint.py +188 -0
ViTPose/easy_ViTPose/.ipynb_checkpoints/requirements_gpu-checkpoint.txt +3 -0
ViTPose/easy_ViTPose/Dockerfile +11 -0
ViTPose/easy_ViTPose/LICENSE +201 -0
ViTPose/easy_ViTPose/README.md +275 -0
ViTPose/easy_ViTPose/colab_demo.ipynb +0 -0
ViTPose/easy_ViTPose/easy_ViTPose.egg-info/PKG-INFO +7 -0
ViTPose/easy_ViTPose/easy_ViTPose.egg-info/SOURCES.txt +56 -0
ViTPose/easy_ViTPose/easy_ViTPose.egg-info/dependency_links.txt +1 -0
ViTPose/easy_ViTPose/easy_ViTPose.egg-info/top_level.txt +1 -0
ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/ViTPose_Inference-checkpoint.ipynb +0 -0
ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/__init__-checkpoint.py +5 -0
ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/config-checkpoint.yaml +15 -0
ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/inference-checkpoint.py +337 -0
ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/testVITPOSE-checkpoint.jpg +0 -0
ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/train-checkpoint.py +174 -0
ViTPose/easy_ViTPose/easy_ViTPose/ViTPose_Inference.ipynb +0 -0
ViTPose/easy_ViTPose/easy_ViTPose/__init__.py +5 -0
ViTPose/easy_ViTPose/easy_ViTPose/config.yaml +15 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/.ipynb_checkpoints/ViTPose_common-checkpoint.py +195 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/.ipynb_checkpoints/ViTPose_small_coco_256x192-checkpoint.py +173 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/.ipynb_checkpoints/ViTPose_wholebody-checkpoint.py +20 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_aic.py +20 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_ap10k.py +22 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_apt36k.py +22 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_coco.py +18 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_coco_25.py +20 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_common.py +195 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_mpii.py +18 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_small_coco_256x192.py +173 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_wholebody.py +20 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/__init__.py +0 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/__pycache__/ViTPose_coco_25.cpython-39.pyc +0 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/__pycache__/ViTPose_common.cpython-39.pyc +0 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/__pycache__/ViTPose_small_coco_256x192.cpython-39.pyc +0 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/__pycache__/__init__.cpython-39.pyc +0 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/300w.py +384 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/aflw.py +83 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/aic.py +140 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/aic_info.py +140 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/animalpose.py +166 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/ap10k.py +142 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/ap10k_info.py +142 -0
ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/atrw.py +144 -0

ViTPose/ckpts/vitpose-s-coco_25.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5af7cbeb123e2a60bf25d981d4b89dab281f3fca18b7956b49a7a685b6311bfe
+size 97235808

ViTPose/easy_ViTPose/.dockerignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Dockerfile
2	+ models

ViTPose/easy_ViTPose/.gitignore ADDED Viewed

	@@ -0,0 +1,13 @@

+**/*.pt
+**/*.pth
+**/*.onnx
+**/__pycache__
+**/coco/
+.DS_Store
+runs
+ckpts
+annotations
+examples
+outputs
+.ipynb_checkpoints
+easy_ViTPose.egg-info

ViTPose/easy_ViTPose/.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,275 @@

+# easy_ViTPose
+<p align="center">
+<img src="https://user-images.githubusercontent.com/24314647/236082274-b25a70c8-9267-4375-97b0-eddf60a7dfc6.png" width=375> easy_ViTPose
+</p>
+## Accurate 2d human and animal pose estimation
+<a target="_blank" href="https://colab.research.google.com/github/JunkyByte/easy_ViTPose/blob/main/colab_demo.ipynb">
+  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
+</a>
+### Easy to use SOTA `ViTPose` [Y. Xu et al., 2022] models for fast inference.
+We provide all the VitPose original models, converted for inference, with single dataset format output.
+In addition to that we also provide a Coco-25 model, trained on the original coco dataset + feet https://cmu-perceptual-computing-lab.github.io/foot_keypoint_dataset/
+Finetuning is not currently supported, you can check de43d54cad87404cf0ad4a7b5da6bacf4240248b and previous commits for a working state of `train.py`
+> [!WARNING]
+> Ultralytics `yolov8` has issue with wrong bounding boxes when using `mps`, upgrade to latest version! (Works correctly on 8.2.48)
+## Results
+![resimg](https://github.com/JunkyByte/easy_ViTPose/assets/24314647/51c0777f-b268-448a-af02-9a3537f288d8)
+https://github.com/JunkyByte/easy_ViTPose/assets/24314647/e9a82c17-6e99-4111-8cc8-5257910cb87e
+https://github.com/JunkyByte/easy_ViTPose/assets/24314647/63af44b1-7245-4703-8906-3f034a43f9e3
+(Credits dance: https://www.youtube.com/watch?v=p-rSdt0aFuw )
+(Credits zebras: https://www.youtube.com/watch?v=y-vELRYS8Yk )
+## Features
+- Image / Video / Webcam support
+- Video support using SORT algorithm to track bboxes between frames
+- Torch / ONNX / Tensorrt inference
+- Runs the original VitPose checkpoints from [ViTAE-Transformer/ViTPose](https://github.com/ViTAE-Transformer/ViTPose)
+- 4 ViTPose architectures with different sizes and performances (s: small, b: base, l: large, h: huge)
+- Multi skeleton and dataset: (AIC / MPII / COCO / COCO + FEET / COCO WHOLEBODY / APT36k / AP10k)
+- Human / Animal pose estimation
+- cpu / gpu / metal support
+- show and save images / videos and output to json
+We run YOLOv8 for detection, it does not provide complete animal detection. You can finetune a custom yolo model to detect the animal you are interested in,
+if you do please open an issue, we might want to integrate other models for detection.
+### Benchmark:
+You can expect realtime >30 fps with modern nvidia gpus and apple silicon (using metal!).
+### Skeleton reference
+There are multiple skeletons for different dataset. Check the definition here [visualization.py](https://github.com/JunkyByte/easy_ViTPose/blob/main/easy_ViTPose/vit_utils/visualization.py).
+## Installation and Usage
+> [!IMPORTANT]
+> Install `torch>2.0 with cuda / mps support` by yourself.
+> also check `requirements_gpu.txt`.
+```bash
+git clone [email protected]:JunkyByte/easy_ViTPose.git
+cd easy_ViTPose/
+pip install -e .
+pip install -r requirements.txt
+```
+### Download models
+- Download the models from [Huggingface](https://huggingface.co/JunkyByte/easy_ViTPose)
+We provide torch models for every dataset and architecture.
+If you want to run onnx / tensorrt inference download the appropriate torch ckpt and use `export.py` to convert it.
+You can use `ultralytics` `yolo export` command to export yolo to onnx and tensorrt as well.
+#### Export to onnx and tensorrt
+```bash
+$ python export.py --help
+usage: export.py [-h] --model-ckpt MODEL_CKPT --model-name {s,b,l,h} [--output OUTPUT] [--dataset DATASET]
+optional arguments:
+  -h, --help            show this help message and exit
+  --model-ckpt MODEL_CKPT
+                        The torch model that shall be used for conversion
+  --model-name {s,b,l,h}
+                        [s: ViT-S, b: ViT-B, l: ViT-L, h: ViT-H]
+  --output OUTPUT       File (without extension) or dir path for checkpoint output
+  --dataset DATASET     Name of the dataset. If None it"s extracted from the file name. ["coco", "coco_25",
+                        "wholebody", "mpii", "ap10k", "apt36k", "aic"]
+```
+### Run inference
+To run inference from command line you can use the `inference.py` script as follows:
+```bash
+$ python inference.py --help
+usage: inference.py [-h] [--input INPUT] [--output-path OUTPUT_PATH] --model MODEL [--yolo YOLO] [--dataset DATASET]
+                    [--det-class DET_CLASS] [--model-name {s,b,l,h}] [--yolo-size YOLO_SIZE]
+                    [--conf-threshold CONF_THRESHOLD] [--rotate {0,90,180,270}] [--yolo-step YOLO_STEP]
+                    [--single-pose] [--show] [--show-yolo] [--show-raw-yolo] [--save-img] [--save-json]
+optional arguments:
+  -h, --help            show this help message and exit
+  --input INPUT         path to image / video or webcam ID (=cv2)
+  --output-path OUTPUT_PATH
+                        output path, if the path provided is a directory output files are "input_name
+                        +_result{extension}".
+  --model MODEL         checkpoint path of the model
+  --yolo YOLO           checkpoint path of the yolo model
+  --dataset DATASET     Name of the dataset. If None it"s extracted from the file name. ["coco", "coco_25",
+                        "wholebody", "mpii", "ap10k", "apt36k", "aic"]
+  --det-class DET_CLASS
+                        ["human", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
+                        "animals"]
+  --model-name {s,b,l,h}
+                        [s: ViT-S, b: ViT-B, l: ViT-L, h: ViT-H]
+  --yolo-size YOLO_SIZE
+                        YOLOv8 image size during inference
+  --conf-threshold CONF_THRESHOLD
+                        Minimum confidence for keypoints to be drawn. [0, 1] range
+  --rotate {0,90,180,270}
+                        Rotate the image of [90, 180, 270] degress counterclockwise
+  --yolo-step YOLO_STEP
+                        The tracker can be used to predict the bboxes instead of yolo for performance, this flag
+                        specifies how often yolo is applied (e.g. 1 applies yolo every frame). This does not have any
+                        effect when is_video is False
+  --single-pose         Do not use SORT tracker because single pose is expected in the video
+  --show                preview result during inference
+  --show-yolo           draw yolo results
+  --show-raw-yolo       draw yolo result before that SORT is applied for tracking (only valid during video inference)
+  --save-img            save image results
+  --save-json           save json results
+```
+You can run inference from code as follows:
+```python
+import cv2
+from easy_ViTPose import VitInference
+# Image to run inference RGB format
+img = cv2.imread('./examples/img1.jpg')
+img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+# set is_video=True to enable tracking in video inference
+# be sure to use VitInference.reset() function to reset the tracker after each video
+# There are a few flags that allows to customize VitInference, be sure to check the class definition
+model_path = './ckpts/vitpose-s-coco_25.pth'
+yolo_path = './yolov8s.pth'
+# If you want to use MPS (on new macbooks) use the torch checkpoints for both ViTPose and Yolo
+# If device is None will try to use cuda -> mps -> cpu (otherwise specify 'cpu', 'mps' or 'cuda')
+# dataset and det_class parameters can be inferred from the ckpt name, but you can specify them.
+model = VitInference(model_path, yolo_path, model_name='s', yolo_size=320, is_video=False, device=None)
+# Infer keypoints, output is a dict where keys are person ids and values are keypoints (np.ndarray (25, 3): (y, x, score))
+# If is_video=True the IDs will be consistent among the ordered video frames.
+keypoints = model.inference(img)
+# call model.reset() after each video
+img = model.draw(show_yolo=True)  # Returns RGB image with drawings
+cv2.imshow('image', cv2.cvtColor(img, cv2.COLOR_RGB2BGR)); cv2.waitKey(0)
+```
+> [!NOTE]
+> If the input file is a video [SORT](https://github.com/abewley/sort) is used to track people IDs and output consistent identifications.
+### OUTPUT json format
+The output format of the json files:
+```
+{
+    "keypoints":
+    [  # The list of frames, len(json['keypoints']) == len(video)
+        {  # For each frame a dict
+            "0": [  #  keys are id to track people and value the keypoints
+                [121.19, 458.15, 0.99], # Each keypoint is (y, x, score)
+                [110.02, 469.43, 0.98],
+                [110.86, 445.04, 0.99],
+            ],
+            "1": [
+                ...
+            ],
+        },
+        {
+            "0": [
+                [122.19, 458.15, 0.91],
+                [105.02, 469.43, 0.95],
+                [122.86, 445.04, 0.99],
+            ],
+            "1": [
+                ...
+            ]
+        }
+    ],
+    "skeleton":
+    {  # Skeleton reference, key the idx, value the name
+        "0": "nose",
+        "1": "left_eye",
+        "2": "right_eye",
+        "3": "left_ear",
+        "4": "right_ear",
+        "5": "neck",
+        ...
+    }
+}
+```
+## Finetuning
+Finetuning is possible but not officially supported right now. If you would like to finetune and need help open an issue.
+You can check `train.py`, `datasets/COCO.py` and `config.yaml` for details.
+---
+## Evaluation on COCO dataset
+1. Download COCO dataset images and labels
+    - 2017 Val images [5K/1GB]: http://images.cocodataset.org/zips/val2017.zip <br>
+        The extracted directory looks like this:
+        ```
+        val2017/
+        ├── 000000000139.jpg
+        ├── 000000000285.jpg
+        ├── 000000000632.jpg
+        └── ...
+        ```
+    - 2017 Train/Val annotations [241MB]: http://images.cocodataset.org/annotations/annotations_trainval2017.zip <br>
+        The extracted directory looks like this:
+        ```
+        annotations/
+        ├── person_keypoints_val2017.json
+        ├── person_keypoints_train2017.json
+        └── ...
+        ```
+2. Run the following command:
+    ```bash
+    $ python evaluation_on_coco.py
+    Command line arguments:
+        --model_path: Path to the pretrained ViT Pose model
+        --yolo_path: Path to the YOLOv8 model
+        --img_folder_path: Path to the directory containing COCO val images (/val2017 extracted in step 1).
+        --annFile: Path to json file for COCO keypoints for val set (annotations/person_keypoints_val2017.json extracted in step 1)
+    ```
+---
+## Docker
+The system may be built in a container using Docker. This is intended to demonstrate container-wise inference, adapt it to your own needs by changing models and skeletons:
+`docker build . -t easy_vitpose`
+The image is based on NVIDIA's PyTorch image, which is 20GB large.
+If you have a compatible GPU set up with [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html),
+ViTPose will run with hardware acceleration.
+To test an example, create a folder called `cats` with a picture of a cat as `image.jpg`.
+Run `./models/download.sh` to fetch the large yolov8 and ap10k ViTPose models. Then run inference using the following command (replace with the correct `cats` and `models` paths):
+`docker run --gpus all --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v ./models:/models -v ~/cats:/cats easy_vitpose python inference.py --det-class cat --input /cats/image.jpg --output-path /cats --save-img --model /models/vitpose-l-ap10k.onnx --yolo /models/yolov8l.pt`
+The result image may be viewed in your `cats` folder.
+## TODO:
+- refactor finetuning (currently not available)
+- benchmark and check bottlenecks of inference pipeline
+- parallel batched inference
+- other minor fixes
+- yolo version for animal pose, check https://github.com/JunkyByte/easy_ViTPose/pull/18
+- solve cuda exceptions on script exit when using tensorrt (no idea how)
+- add infos about inferred informations during inference, better output of inference status (device etc)
+- check if is possible to make colab work without runtime restart
+Feel free to open issues, pull requests and contribute on these TODOs.
+## Reference
+Thanks to the VitPose authors and their official implementation [ViTAE-Transformer/ViTPose](https://github.com/ViTAE-Transformer/ViTPose).
+The SORT code is taken from [abewley/sort](https://github.com/abewley/sort)

ViTPose/easy_ViTPose/.ipynb_checkpoints/colab_demo-checkpoint.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

ViTPose/easy_ViTPose/.ipynb_checkpoints/evaluation_on_coco-checkpoint.py ADDED Viewed

	@@ -0,0 +1,92 @@

+# Reference: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
+import cv2
+from easy_ViTPose.inference import VitInference
+from pathlib import Path
+import os
+from tqdm.auto import tqdm
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval
+from statistics import mean
+import json
+import argparse
+def parse_arguments():
+    parser = argparse.ArgumentParser(description='Argument Parser for infer')
+    parser.add_argument('--model_path', type=str,
+                        help='Path to the ViT Pose model')
+    parser.add_argument('--model-name', type=str, choices=['s', 'b', 'l', 'h'],
+                        help='[s: ViT-S, b: ViT-B, l: ViT-L, h: ViT-H]')
+    parser.add_argument('--yolo_path', type=str,
+                        help='Path to the YOLOv8 model')
+    parser.add_argument('--img_folder_path', type=str,
+                        help='Path to the folder containing images')
+    parser.add_argument('--annFile', type=str,
+                        help='Path to the COCO annotations file')
+    return parser.parse_args()
+def evaluation_on_coco(model_path, model_name, yolo_path, img_folder_path, annFile):
+    # get image IDs of images in val set
+    # Opening JSON file
+    f = open(annFile)
+    gt_annotations = json.load(f)
+    f.close()
+    image_ids = set()
+    for ann in gt_annotations['images']:
+        image_ids.add(ann['id'])
+    model = VitInference(model_path, yolo_path, model_name = model_name, yolo_size=640, is_video=False, device=None)
+    results_list = []
+    for image_id in tqdm(image_ids):
+        # run inference here
+        img_path = os.path.join(img_folder_path, str(image_id).zfill(12) + '.jpg')
+        img = cv2.imread(img_path)
+        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        frame_keypoints = model.inference(img)
+        for key in frame_keypoints:
+            results_element = {}
+            results_element['image_id'] = image_id
+            results_element['category_id'] = 1
+            results_element['score'] = model._scores_bbox[key]
+            results_element['bbox'] = []
+            keypoints = []
+            for k in frame_keypoints[key]:
+                keypoints.append(float(round(k[1], 0)))
+                keypoints.append(float(round(k[0], 0)))
+                keypoints.append(0)
+            results_element['keypoints'] = keypoints
+            results_list.append(results_element)
+    # Define the file path where you want to save the JSON file
+    file_path = "results.json"
+    # Save the list of dictionaries to a JSON file
+    with open(file_path, "w") as json_file:
+        json.dump(results_list, json_file, indent=4)
+    #initialize COCO ground truth api
+    annType = 'keypoints'
+    cocoGt=COCO(annFile)
+    #initialize COCO detections api
+    resFile="results.json"
+    cocoDt=cocoGt.loadRes(resFile)
+    # running evaluation
+    cocoEval = COCOeval(cocoGt,cocoDt,annType)
+    cocoEval.params.imgIds  = [int(i) for i in image_ids]
+    cocoEval.evaluate()
+    cocoEval.accumulate()
+    cocoEval.summarize()
+if __name__ == '__main__':
+    args = parse_arguments()
+    evaluation_on_coco(args.model_path, args.model_name, args.yolo_path, args.img_folder_path, args.annFile)

ViTPose/easy_ViTPose/.ipynb_checkpoints/inference-checkpoint.py ADDED Viewed

	@@ -0,0 +1,188 @@

+import argparse
+import json
+import os
+import time
+from PIL import Image
+import cv2
+import numpy as np
+import torch
+import tqdm
+from easy_ViTPose.vit_utils.inference import NumpyEncoder, VideoReader
+from easy_ViTPose.inference import VitInference
+from easy_ViTPose.vit_utils.visualization import joints_dict
+try:
+    import onnxruntime  # noqa: F401
+    has_onnx = True
+except ModuleNotFoundError:
+    has_onnx = False
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--input', type=str, required=True,
+                        help='path to image / video or webcam ID (=cv2)')
+    parser.add_argument('--output-path', type=str, default='',
+                        help='output path, if the path provided is a directory '
+                        'output files are "input_name +_result{extension}".')
+    parser.add_argument('--model', type=str, required=True,
+                        help='checkpoint path of the model')
+    parser.add_argument('--yolo', type=str, required=False, default=None,
+                        help='checkpoint path of the yolo model')
+    parser.add_argument('--dataset', type=str, required=False, default=None,
+                        help='Name of the dataset. If None it"s extracted from the file name. \
+                              ["coco", "coco_25", "wholebody", "mpii", "ap10k", "apt36k", "aic"]')
+    parser.add_argument('--det-class', type=str, required=False, default=None,
+                        help='["human", "cat", "dog", "horse", "sheep", \
+                               "cow", "elephant", "bear", "zebra", "giraffe", "animals"]')
+    parser.add_argument('--model-name', type=str, required=False, choices=['s', 'b', 'l', 'h'],
+                        help='[s: ViT-S, b: ViT-B, l: ViT-L, h: ViT-H]')
+    parser.add_argument('--yolo-size', type=int, required=False, default=320,
+                        help='YOLOv8 image size during inference')
+    parser.add_argument('--conf-threshold', type=float, required=False, default=0.5,
+                        help='Minimum confidence for keypoints to be drawn. [0, 1] range')
+    parser.add_argument('--rotate', type=int, choices=[0, 90, 180, 270],
+                        required=False, default=0,
+                        help='Rotate the image of [90, 180, 270] degress counterclockwise')
+    parser.add_argument('--yolo-step', type=int,
+                        required=False, default=1,
+                        help='The tracker can be used to predict the bboxes instead of yolo for performance, '
+                             'this flag specifies how often yolo is applied (e.g. 1 applies yolo every frame). '
+                             'This does not have any effect when is_video is False')
+    parser.add_argument('--single-pose', default=False, action='store_true',
+                        help='Do not use SORT tracker because single pose is expected in the video')
+    parser.add_argument('--show', default=False, action='store_true',
+                        help='preview result during inference')
+    parser.add_argument('--show-yolo', default=False, action='store_true',
+                        help='draw yolo results')
+    parser.add_argument('--show-raw-yolo', default=False, action='store_true',
+                        help='draw yolo result before that SORT is applied for tracking'
+                        ' (only valid during video inference)')
+    parser.add_argument('--save-img', default=False, action='store_true',
+                        help='save image results')
+    parser.add_argument('--save-json', default=False, action='store_true',
+                        help='save json results')
+    args = parser.parse_args()
+    use_mps = hasattr(torch.backends, 'mps') and torch.backends.mps.is_available()
+    use_cuda = torch.cuda.is_available()
+    # Load Yolo
+    yolo = args.yolo
+    if yolo is None:
+        yolo = 'easy_ViTPose/' + ('yolov8s' + ('.onnx' if has_onnx and not (use_mps or use_cuda) else '.pt'))
+    input_path = args.input
+    # Load the image / video reader
+    try:  # Check if is webcam
+        int(input_path)
+        is_video = True
+    except ValueError:
+        assert os.path.isfile(input_path), 'The input file does not exist'
+        is_video = input_path[input_path.rfind('.') + 1:].lower() in ['mp4', 'mov']
+    ext = '.mp4' if is_video else '.png'
+    assert not (args.save_img or args.save_json) or args.output_path, \
+        'Specify an output path if using save-img or save-json flags'
+    output_path = args.output_path
+    if output_path:
+        if os.path.isdir(output_path):
+            og_ext = input_path[input_path.rfind('.'):]
+            save_name_img = os.path.basename(input_path).replace(og_ext, f"_result{ext}")
+            save_name_json = os.path.basename(input_path).replace(og_ext, "_result.json")
+            output_path_img = os.path.join(output_path, save_name_img)
+            output_path_json = os.path.join(output_path, save_name_json)
+        else:
+            output_path_img = output_path + f'{ext}'
+            output_path_json = output_path + '.json'
+    wait = 0
+    total_frames = 1
+    if is_video:
+        reader = VideoReader(input_path, args.rotate)
+        cap = cv2.VideoCapture(input_path)  # type: ignore
+        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+        cap.release()
+        wait = 15
+        if args.save_img:
+            cap = cv2.VideoCapture(input_path)  # type: ignore
+            fps = cap.get(cv2.CAP_PROP_FPS)
+            ret, frame = cap.read()
+            cap.release()
+            assert ret
+            assert fps > 0
+            output_size = frame.shape[:2][::-1]
+            # Check if we have X264 otherwise use default MJPG
+            try:
+                temp_video = cv2.VideoWriter('/tmp/checkcodec.mp4',
+                                             cv2.VideoWriter_fourcc(*'h264'), 30, (32, 32))
+                opened = temp_video.isOpened()
+            except Exception:
+                opened = False
+            codec = 'h264' if opened else 'MJPG'
+            out_writer = cv2.VideoWriter(output_path_img,
+                                         cv2.VideoWriter_fourcc(*codec),  # More efficient codec
+                                         fps, output_size)  # type: ignore
+    else:
+        reader = [np.array(Image.open(input_path).rotate(args.rotate))]  # type: ignore
+    # Initialize model
+    model = VitInference(args.model, yolo, args.model_name,
+                         args.det_class, args.dataset,
+                         args.yolo_size, is_video=is_video,
+                         single_pose=args.single_pose,
+                         yolo_step=args.yolo_step)  # type: ignore
+    print(f">>> Model loaded: {args.model}")
+    print(f'>>> Running inference on {input_path}')
+    keypoints = []
+    fps = []
+    tot_time = 0.
+    for (ith, img) in tqdm.tqdm(enumerate(reader), total=total_frames):
+        t0 = time.time()
+        # Run inference
+        frame_keypoints = model.inference(img)
+        keypoints.append(frame_keypoints)
+        delta = time.time() - t0
+        tot_time += delta
+        fps.append(delta)
+        # Draw the poses and save the output img
+        if args.show or args.save_img:
+            # Draw result and transform to BGR
+            img = model.draw(args.show_yolo, args.show_raw_yolo, args.conf_threshold)[..., ::-1]
+            if args.save_img:
+                # TODO: If exists add (1), (2), ...
+                if is_video:
+                    out_writer.write(img)
+                else:
+                    print('>>> Saving output image')
+                    cv2.imwrite(output_path_img, img)
+            if args.show:
+                cv2.imshow('preview', img)
+                cv2.waitKey(wait)
+    if is_video:
+        tot_poses = sum(len(k) for k in keypoints)
+        print(f'>>> Mean inference FPS: {1 / np.mean(fps):.2f}')
+        print(f'>>> Total poses predicted: {tot_poses} mean per frame: '
+              f'{(tot_poses / (ith + 1)):.2f}')
+        print(f'>>> Mean FPS per pose: {(tot_poses / tot_time):.2f}')
+    if args.save_json:
+        print('>>> Saving output json')
+        with open(output_path_json, 'w') as f:
+            out = {'keypoints': keypoints,
+                   'skeleton': joints_dict()[model.dataset]['keypoints']}
+            json.dump(out, f, cls=NumpyEncoder)
+    if is_video and args.save_img:
+        out_writer.release()
+    cv2.destroyAllWindows()

ViTPose/easy_ViTPose/.ipynb_checkpoints/requirements_gpu-checkpoint.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+onnxruntime-gpu>=1.13.0
+tensorrt>=8.5.1.7
+torch-tensorrt>=1.4.0

ViTPose/easy_ViTPose/Dockerfile ADDED Viewed

	@@ -0,0 +1,11 @@

+FROM nvcr.io/nvidia/pytorch:24.07-py3
+COPY . /easy_ViTPose
+WORKDIR /easy_ViTPose/
+ENV DEBIAN_FRONTEND=noninteractive
+RUN pip uninstall -y $(pip list --format=freeze | grep opencv) && \
+    rm -rf /usr/local/lib/python3.10/dist-packages/cv2/
+RUN pip install -e . && pip install -r requirements.txt && pip install -r requirements_gpu.txt
+# OpenCV dependency
+RUN apt-get update && apt-get install -y libgl1

ViTPose/easy_ViTPose/LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

ViTPose/easy_ViTPose/README.md ADDED Viewed

	@@ -0,0 +1,275 @@

+# easy_ViTPose
+<p align="center">
+<img src="https://user-images.githubusercontent.com/24314647/236082274-b25a70c8-9267-4375-97b0-eddf60a7dfc6.png" width=375> easy_ViTPose
+</p>
+## Accurate 2d human and animal pose estimation
+<a target="_blank" href="https://colab.research.google.com/github/JunkyByte/easy_ViTPose/blob/main/colab_demo.ipynb">
+  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
+</a>
+### Easy to use SOTA `ViTPose` [Y. Xu et al., 2022] models for fast inference.
+We provide all the VitPose original models, converted for inference, with single dataset format output.
+In addition to that we also provide a Coco-25 model, trained on the original coco dataset + feet https://cmu-perceptual-computing-lab.github.io/foot_keypoint_dataset/
+Finetuning is not currently supported, you can check de43d54cad87404cf0ad4a7b5da6bacf4240248b and previous commits for a working state of `train.py`
+> [!WARNING]
+> Ultralytics `yolov8` has issue with wrong bounding boxes when using `mps`, upgrade to latest version! (Works correctly on 8.2.48)
+## Results
+![resimg](https://github.com/JunkyByte/easy_ViTPose/assets/24314647/51c0777f-b268-448a-af02-9a3537f288d8)
+https://github.com/JunkyByte/easy_ViTPose/assets/24314647/e9a82c17-6e99-4111-8cc8-5257910cb87e
+https://github.com/JunkyByte/easy_ViTPose/assets/24314647/63af44b1-7245-4703-8906-3f034a43f9e3
+(Credits dance: https://www.youtube.com/watch?v=p-rSdt0aFuw )
+(Credits zebras: https://www.youtube.com/watch?v=y-vELRYS8Yk )
+## Features
+- Image / Video / Webcam support
+- Video support using SORT algorithm to track bboxes between frames
+- Torch / ONNX / Tensorrt inference
+- Runs the original VitPose checkpoints from [ViTAE-Transformer/ViTPose](https://github.com/ViTAE-Transformer/ViTPose)
+- 4 ViTPose architectures with different sizes and performances (s: small, b: base, l: large, h: huge)
+- Multi skeleton and dataset: (AIC / MPII / COCO / COCO + FEET / COCO WHOLEBODY / APT36k / AP10k)
+- Human / Animal pose estimation
+- cpu / gpu / metal support
+- show and save images / videos and output to json
+We run YOLOv8 for detection, it does not provide complete animal detection. You can finetune a custom yolo model to detect the animal you are interested in,
+if you do please open an issue, we might want to integrate other models for detection.
+### Benchmark:
+You can expect realtime >30 fps with modern nvidia gpus and apple silicon (using metal!).
+### Skeleton reference
+There are multiple skeletons for different dataset. Check the definition here [visualization.py](https://github.com/JunkyByte/easy_ViTPose/blob/main/easy_ViTPose/vit_utils/visualization.py).
+## Installation and Usage
+> [!IMPORTANT]
+> Install `torch>2.0 with cuda / mps support` by yourself.
+> also check `requirements_gpu.txt`.
+```bash
+git clone [email protected]:JunkyByte/easy_ViTPose.git
+cd easy_ViTPose/
+pip install -e .
+pip install -r requirements.txt
+```
+### Download models
+- Download the models from [Huggingface](https://huggingface.co/JunkyByte/easy_ViTPose)
+We provide torch models for every dataset and architecture.
+If you want to run onnx / tensorrt inference download the appropriate torch ckpt and use `export.py` to convert it.
+You can use `ultralytics` `yolo export` command to export yolo to onnx and tensorrt as well.
+#### Export to onnx and tensorrt
+```bash
+$ python export.py --help
+usage: export.py [-h] --model-ckpt MODEL_CKPT --model-name {s,b,l,h} [--output OUTPUT] [--dataset DATASET]
+optional arguments:
+  -h, --help            show this help message and exit
+  --model-ckpt MODEL_CKPT
+                        The torch model that shall be used for conversion
+  --model-name {s,b,l,h}
+                        [s: ViT-S, b: ViT-B, l: ViT-L, h: ViT-H]
+  --output OUTPUT       File (without extension) or dir path for checkpoint output
+  --dataset DATASET     Name of the dataset. If None it"s extracted from the file name. ["coco", "coco_25",
+                        "wholebody", "mpii", "ap10k", "apt36k", "aic"]
+```
+### Run inference
+To run inference from command line you can use the `inference.py` script as follows:
+```bash
+$ python inference.py --help
+usage: inference.py [-h] [--input INPUT] [--output-path OUTPUT_PATH] --model MODEL [--yolo YOLO] [--dataset DATASET]
+                    [--det-class DET_CLASS] [--model-name {s,b,l,h}] [--yolo-size YOLO_SIZE]
+                    [--conf-threshold CONF_THRESHOLD] [--rotate {0,90,180,270}] [--yolo-step YOLO_STEP]
+                    [--single-pose] [--show] [--show-yolo] [--show-raw-yolo] [--save-img] [--save-json]
+optional arguments:
+  -h, --help            show this help message and exit
+  --input INPUT         path to image / video or webcam ID (=cv2)
+  --output-path OUTPUT_PATH
+                        output path, if the path provided is a directory output files are "input_name
+                        +_result{extension}".
+  --model MODEL         checkpoint path of the model
+  --yolo YOLO           checkpoint path of the yolo model
+  --dataset DATASET     Name of the dataset. If None it"s extracted from the file name. ["coco", "coco_25",
+                        "wholebody", "mpii", "ap10k", "apt36k", "aic"]
+  --det-class DET_CLASS
+                        ["human", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
+                        "animals"]
+  --model-name {s,b,l,h}
+                        [s: ViT-S, b: ViT-B, l: ViT-L, h: ViT-H]
+  --yolo-size YOLO_SIZE
+                        YOLOv8 image size during inference
+  --conf-threshold CONF_THRESHOLD
+                        Minimum confidence for keypoints to be drawn. [0, 1] range
+  --rotate {0,90,180,270}
+                        Rotate the image of [90, 180, 270] degress counterclockwise
+  --yolo-step YOLO_STEP
+                        The tracker can be used to predict the bboxes instead of yolo for performance, this flag
+                        specifies how often yolo is applied (e.g. 1 applies yolo every frame). This does not have any
+                        effect when is_video is False
+  --single-pose         Do not use SORT tracker because single pose is expected in the video
+  --show                preview result during inference
+  --show-yolo           draw yolo results
+  --show-raw-yolo       draw yolo result before that SORT is applied for tracking (only valid during video inference)
+  --save-img            save image results
+  --save-json           save json results
+```
+You can run inference from code as follows:
+```python
+import cv2
+from easy_ViTPose import VitInference
+# Image to run inference RGB format
+img = cv2.imread('./examples/img1.jpg')
+img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+# set is_video=True to enable tracking in video inference
+# be sure to use VitInference.reset() function to reset the tracker after each video
+# There are a few flags that allows to customize VitInference, be sure to check the class definition
+model_path = './ckpts/vitpose-s-coco_25.pth'
+yolo_path = './yolov8s.pth'
+# If you want to use MPS (on new macbooks) use the torch checkpoints for both ViTPose and Yolo
+# If device is None will try to use cuda -> mps -> cpu (otherwise specify 'cpu', 'mps' or 'cuda')
+# dataset and det_class parameters can be inferred from the ckpt name, but you can specify them.
+model = VitInference(model_path, yolo_path, model_name='s', yolo_size=320, is_video=False, device=None)
+# Infer keypoints, output is a dict where keys are person ids and values are keypoints (np.ndarray (25, 3): (y, x, score))
+# If is_video=True the IDs will be consistent among the ordered video frames.
+keypoints = model.inference(img)
+# call model.reset() after each video
+img = model.draw(show_yolo=True)  # Returns RGB image with drawings
+cv2.imshow('image', cv2.cvtColor(img, cv2.COLOR_RGB2BGR)); cv2.waitKey(0)
+```
+> [!NOTE]
+> If the input file is a video [SORT](https://github.com/abewley/sort) is used to track people IDs and output consistent identifications.
+### OUTPUT json format
+The output format of the json files:
+```
+{
+    "keypoints":
+    [  # The list of frames, len(json['keypoints']) == len(video)
+        {  # For each frame a dict
+            "0": [  #  keys are id to track people and value the keypoints
+                [121.19, 458.15, 0.99], # Each keypoint is (y, x, score)
+                [110.02, 469.43, 0.98],
+                [110.86, 445.04, 0.99],
+            ],
+            "1": [
+                ...
+            ],
+        },
+        {
+            "0": [
+                [122.19, 458.15, 0.91],
+                [105.02, 469.43, 0.95],
+                [122.86, 445.04, 0.99],
+            ],
+            "1": [
+                ...
+            ]
+        }
+    ],
+    "skeleton":
+    {  # Skeleton reference, key the idx, value the name
+        "0": "nose",
+        "1": "left_eye",
+        "2": "right_eye",
+        "3": "left_ear",
+        "4": "right_ear",
+        "5": "neck",
+        ...
+    }
+}
+```
+## Finetuning
+Finetuning is possible but not officially supported right now. If you would like to finetune and need help open an issue.
+You can check `train.py`, `datasets/COCO.py` and `config.yaml` for details.
+---
+## Evaluation on COCO dataset
+1. Download COCO dataset images and labels
+    - 2017 Val images [5K/1GB]: http://images.cocodataset.org/zips/val2017.zip <br>
+        The extracted directory looks like this:
+        ```
+        val2017/
+        ├── 000000000139.jpg
+        ├── 000000000285.jpg
+        ├── 000000000632.jpg
+        └── ...
+        ```
+    - 2017 Train/Val annotations [241MB]: http://images.cocodataset.org/annotations/annotations_trainval2017.zip <br>
+        The extracted directory looks like this:
+        ```
+        annotations/
+        ├── person_keypoints_val2017.json
+        ├── person_keypoints_train2017.json
+        └── ...
+        ```
+2. Run the following command:
+    ```bash
+    $ python evaluation_on_coco.py
+    Command line arguments:
+        --model_path: Path to the pretrained ViT Pose model
+        --yolo_path: Path to the YOLOv8 model
+        --img_folder_path: Path to the directory containing COCO val images (/val2017 extracted in step 1).
+        --annFile: Path to json file for COCO keypoints for val set (annotations/person_keypoints_val2017.json extracted in step 1)
+    ```
+---
+## Docker
+The system may be built in a container using Docker. This is intended to demonstrate container-wise inference, adapt it to your own needs by changing models and skeletons:
+`docker build . -t easy_vitpose`
+The image is based on NVIDIA's PyTorch image, which is 20GB large.
+If you have a compatible GPU set up with [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html),
+ViTPose will run with hardware acceleration.
+To test an example, create a folder called `cats` with a picture of a cat as `image.jpg`.
+Run `./models/download.sh` to fetch the large yolov8 and ap10k ViTPose models. Then run inference using the following command (replace with the correct `cats` and `models` paths):
+`docker run --gpus all --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v ./models:/models -v ~/cats:/cats easy_vitpose python inference.py --det-class cat --input /cats/image.jpg --output-path /cats --save-img --model /models/vitpose-l-ap10k.onnx --yolo /models/yolov8l.pt`
+The result image may be viewed in your `cats` folder.
+## TODO:
+- refactor finetuning (currently not available)
+- benchmark and check bottlenecks of inference pipeline
+- parallel batched inference
+- other minor fixes
+- yolo version for animal pose, check https://github.com/JunkyByte/easy_ViTPose/pull/18
+- solve cuda exceptions on script exit when using tensorrt (no idea how)
+- add infos about inferred informations during inference, better output of inference status (device etc)
+- check if is possible to make colab work without runtime restart
+Feel free to open issues, pull requests and contribute on these TODOs.
+## Reference
+Thanks to the VitPose authors and their official implementation [ViTAE-Transformer/ViTPose](https://github.com/ViTAE-Transformer/ViTPose).
+The SORT code is taken from [abewley/sort](https://github.com/abewley/sort)

ViTPose/easy_ViTPose/colab_demo.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

ViTPose/easy_ViTPose/easy_ViTPose.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,7 @@

+Metadata-Version: 2.1
+Name: easy_ViTPose
+Version: 1.1
+Home-page: https://github.com/JunkyByte/easy_ViTPose
+Author: JunkyByte
+Author-email: [email protected]
+License-File: LICENSE

ViTPose/easy_ViTPose/easy_ViTPose.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,56 @@

+LICENSE
+README.md
+setup.py
+easy_ViTPose/__init__.py
+easy_ViTPose/inference.py
+easy_ViTPose/sort.py
+easy_ViTPose/train.py
+easy_ViTPose.egg-info/PKG-INFO
+easy_ViTPose.egg-info/SOURCES.txt
+easy_ViTPose.egg-info/dependency_links.txt
+easy_ViTPose.egg-info/top_level.txt
+easy_ViTPose/configs/ViTPose_aic.py
+easy_ViTPose/configs/ViTPose_ap10k.py
+easy_ViTPose/configs/ViTPose_apt36k.py
+easy_ViTPose/configs/ViTPose_coco.py
+easy_ViTPose/configs/ViTPose_coco_25.py
+easy_ViTPose/configs/ViTPose_common.py
+easy_ViTPose/configs/ViTPose_mpii.py
+easy_ViTPose/configs/ViTPose_wholebody.py
+easy_ViTPose/configs/__init__.py
+easy_ViTPose/datasets/COCO.py
+easy_ViTPose/datasets/HumanPoseEstimation.py
+easy_ViTPose/datasets/__init__.py
+easy_ViTPose/vit_models/__init__.py
+easy_ViTPose/vit_models/model.py
+easy_ViTPose/vit_models/optimizer.py
+easy_ViTPose/vit_models/backbone/__init__.py
+easy_ViTPose/vit_models/backbone/vit.py
+easy_ViTPose/vit_models/head/__init__.py
+easy_ViTPose/vit_models/head/topdown_heatmap_base_head.py
+easy_ViTPose/vit_models/head/topdown_heatmap_simple_head.py
+easy_ViTPose/vit_models/losses/__init__.py
+easy_ViTPose/vit_models/losses/classfication_loss.py
+easy_ViTPose/vit_models/losses/heatmap_loss.py
+easy_ViTPose/vit_models/losses/mesh_loss.py
+easy_ViTPose/vit_models/losses/mse_loss.py
+easy_ViTPose/vit_models/losses/multi_loss_factory.py
+easy_ViTPose/vit_models/losses/regression_loss.py
+easy_ViTPose/vit_utils/__init__.py
+easy_ViTPose/vit_utils/dist_util.py
+easy_ViTPose/vit_utils/inference.py
+easy_ViTPose/vit_utils/logging.py
+easy_ViTPose/vit_utils/top_down_eval.py
+easy_ViTPose/vit_utils/train_valid_fn.py
+easy_ViTPose/vit_utils/transform.py
+easy_ViTPose/vit_utils/util.py
+easy_ViTPose/vit_utils/visualization.py
+easy_ViTPose/vit_utils/nms/__init__.py
+easy_ViTPose/vit_utils/nms/nms.py
+easy_ViTPose/vit_utils/nms/nms_ori.py
+easy_ViTPose/vit_utils/nms/setup_linux.py
+easy_ViTPose/vit_utils/post_processing/__init__.py
+easy_ViTPose/vit_utils/post_processing/group.py
+easy_ViTPose/vit_utils/post_processing/nms.py
+easy_ViTPose/vit_utils/post_processing/one_euro_filter.py
+easy_ViTPose/vit_utils/post_processing/post_transforms.py

ViTPose/easy_ViTPose/easy_ViTPose.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

ViTPose/easy_ViTPose/easy_ViTPose.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ easy_ViTPose

ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/ViTPose_Inference-checkpoint.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/__init__-checkpoint.py ADDED Viewed

	@@ -0,0 +1,5 @@

+from .inference import VitInference
+__all__ = [
+    'VitInference'
+]

ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/config-checkpoint.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+# Train config ---------------------------------------
+log_level: logging.INFO
+seed: 0
+gpu_ids: 0
+deterministic: True
+cudnn_benchmark: True # Use cudnn
+resume_from: "C:/Users/user/ViTPose/ckpts/vitpose-s-coco_25.pth" # CKPT path
+#resume_from: False
+gpu_ids: [0]
+launcher: 'none' # When distributed training ['none', 'pytorch', 'slurm', 'mpi']
+use_amp: False
+validate: True
+autoscale_lr: False
+dist_params:
+  ...

ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/inference-checkpoint.py ADDED Viewed

	@@ -0,0 +1,337 @@

+import abc
+import os
+from typing import Optional
+import typing
+import cv2
+import numpy as np
+import torch
+from ultralytics import YOLO
+from .configs.ViTPose_common import data_cfg
+from .sort import Sort
+from .vit_models.model import ViTPose
+from .vit_utils.inference import draw_bboxes, pad_image
+from .vit_utils.top_down_eval import keypoints_from_heatmaps
+from .vit_utils.util import dyn_model_import, infer_dataset_by_path
+from .vit_utils.visualization import draw_points_and_skeleton, joints_dict
+try:
+    import torch_tensorrt
+except ModuleNotFoundError:
+    pass
+try:
+    import onnxruntime
+except ModuleNotFoundError:
+    pass
+__all__ = ['VitInference']
+np.bool = np.bool_
+MEAN = [0.485, 0.456, 0.406]
+STD = [0.229, 0.224, 0.225]
+DETC_TO_YOLO_YOLOC = {
+    'human': [0],
+    'cat': [15],
+    'dog': [16],
+    'horse': [17],
+    'sheep': [18],
+    'cow': [19],
+    'elephant': [20],
+    'bear': [21],
+    'zebra': [22],
+    'giraffe': [23],
+    'animals': [15, 16, 17, 18, 19, 20, 21, 22, 23]
+}
+class VitInference:
+    """
+    Class for performing inference using ViTPose models with YOLOv8 human detection and SORT tracking.
+    Args:
+        model (str): Path to the ViT model file (.pth, .onnx, .engine).
+        yolo (str): Path of the YOLOv8 model to load.
+        model_name (str, optional): Name of the ViT model architecture to use.
+                                    Valid values are 's', 'b', 'l', 'h'.
+                                    Defaults to None, is necessary when using .pth checkpoints.
+        det_class (str, optional): the detection class. if None it is inferred by the dataset.
+                                   valid values are 'human', 'cat', 'dog', 'horse', 'sheep',
+                                                    'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+                                                    'animals' (which is all previous but human)
+        dataset (str, optional): Name of the dataset. If None it's extracted from the file name.
+                                 Valid values are 'coco', 'coco_25', 'wholebody', 'mpii',
+                                                  'ap10k', 'apt36k', 'aic'
+        yolo_size (int, optional): Size of the input image for YOLOv8 model. Defaults to 320.
+        device (str, optional): Device to use for inference. Defaults to 'cuda' if available, else 'cpu'.
+        is_video (bool, optional): Flag indicating if the input is video. Defaults to False.
+        single_pose (bool, optional): Flag indicating if the video (on images this flag has no effect)
+                                      will contain a single pose.
+                                      In this case the SORT tracker is not used (increasing performance)
+                                      but people id tracking
+                                      won't be consistent among frames.
+        yolo_step (int, optional): The tracker can be used to predict the bboxes instead of yolo for performance,
+                                   this flag specifies how often yolo is applied (e.g. 1 applies yolo every frame).
+                                   This does not have any effect when is_video is False.
+    """
+    def __init__(self, model: str,
+                 yolo: str,
+                 model_name: Optional[str] = None,
+                 det_class: Optional[str] = None,
+                 dataset: Optional[str] = None,
+                 yolo_size: Optional[int] = 320,
+                 device: Optional[str] = None,
+                 is_video: Optional[bool] = False,
+                 single_pose: Optional[bool] = False,
+                 yolo_step: Optional[int] = 1):
+        assert os.path.isfile(model), f'The model file {model} does not exist'
+        assert os.path.isfile(yolo), f'The YOLOv8 model {yolo} does not exist'
+        # Device priority is cuda / mps / cpu
+        if device is None:
+            if torch.cuda.is_available():
+                device = 'cuda'
+            elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
+                device = 'mps'
+            else:
+                device = 'cpu'
+        self.device = device
+        self.yolo = YOLO(yolo, task='detect')
+        self.yolo_size = yolo_size
+        self.yolo_step = yolo_step
+        self.is_video = is_video
+        self.single_pose = single_pose
+        self.reset()
+        # State saving during inference
+        self.save_state = True  # Can be disabled manually
+        self._img = None
+        self._yolo_res = None
+        self._tracker_res = None
+        self._keypoints = None
+        # Use extension to decide which kind of model has been loaded
+        use_onnx = model.endswith('.onnx')
+        use_trt = model.endswith('.engine')
+        # Extract dataset name
+        if dataset is None:
+            dataset = infer_dataset_by_path(model)
+        assert dataset in ['mpii', 'coco', 'coco_25', 'wholebody', 'aic', 'ap10k', 'apt36k'], \
+            'The specified dataset is not valid'
+        # Dataset can now be set for visualization
+        self.dataset = dataset
+        # if we picked the dataset switch to correct yolo classes if not set
+        if det_class is None:
+            det_class = 'animals' if dataset in ['ap10k', 'apt36k'] else 'human'
+        self.yolo_classes = DETC_TO_YOLO_YOLOC[det_class]
+        assert model_name in [None, 's', 'b', 'l', 'h'], \
+            f'The model name {model_name} is not valid'
+        # onnx / trt models do not require model_cfg specification
+        if model_name is None:
+            assert use_onnx or use_trt, \
+                'Specify the model_name if not using onnx / trt'
+        else:
+            # Dynamically import the model class
+            model_cfg = dyn_model_import(self.dataset, model_name)
+        self.target_size = data_cfg['image_size']
+        if use_onnx:
+            self._ort_session = onnxruntime.InferenceSession(model,
+                                                             providers=['CUDAExecutionProvider',
+                                                                        'CPUExecutionProvider'])
+            inf_fn = self._inference_onnx
+        else:
+            self._vit_pose = ViTPose(model_cfg)
+            self._vit_pose.eval()
+            if use_trt:
+                self._vit_pose = torch.jit.load(model)
+            else:
+                ckpt = torch.load(model, map_location='cpu', weights_only=True)
+                if 'state_dict' in ckpt:
+                    self._vit_pose.load_state_dict(ckpt['state_dict'])
+                else:
+                    self._vit_pose.load_state_dict(ckpt)
+                self._vit_pose.to(torch.device(device))
+            inf_fn = self._inference_torch
+        # Override _inference abstract with selected engine
+        self._inference = inf_fn  # type: ignore
+    def reset(self):
+        """
+        Reset the inference class to be ready for a new video.
+        This will reset the internal counter of frames, on videos
+        this is necessary to reset the tracker.
+        """
+        min_hits = 3 if self.yolo_step == 1 else 1
+        use_tracker = self.is_video and not self.single_pose
+        self.tracker = Sort(max_age=self.yolo_step,
+                            min_hits=min_hits,
+                            iou_threshold=0.3) if use_tracker else None  # TODO: Params
+        self.frame_counter = 0
+    @classmethod
+    def postprocess(cls, heatmaps, org_w, org_h):
+        """
+        Postprocess the heatmaps to obtain keypoints and their probabilities.
+        Args:
+            heatmaps (ndarray): Heatmap predictions from the model.
+            org_w (int): Original width of the image.
+            org_h (int): Original height of the image.
+        Returns:
+            ndarray: Processed keypoints with probabilities.
+        """
+        points, prob = keypoints_from_heatmaps(heatmaps=heatmaps,
+                                               center=np.array([[org_w // 2,
+                                                                 org_h // 2]]),
+                                               scale=np.array([[org_w, org_h]]),
+                                               unbiased=True, use_udp=True)
+        return np.concatenate([points[:, :, ::-1], prob], axis=2)
+    @abc.abstractmethod
+    def _inference(self, img: np.ndarray) -> np.ndarray:
+        """
+        Abstract method for performing inference on an image.
+        It is overloaded by each inference engine.
+        Args:
+            img (ndarray): Input image for inference.
+        Returns:
+            ndarray: Inference results.
+        """
+        raise NotImplementedError
+    def inference(self, img: np.ndarray) -> dict[typing.Any, typing.Any]:
+        """
+        Perform inference on the input image.
+        Args:
+            img (ndarray): Input image for inference in RGB format.
+        Returns:
+            dict[typing.Any, typing.Any]: Inference results.
+        """
+        # First use YOLOv8 for detection
+        res_pd = np.empty((0, 5))
+        results = None
+        if (self.tracker is None or
+           (self.frame_counter % self.yolo_step == 0 or self.frame_counter < 3)):
+            results = self.yolo(img[..., ::-1], verbose=False, imgsz=self.yolo_size,
+                                device=self.device if self.device != 'cuda' else 0,
+                                classes=self.yolo_classes)[0]
+            res_pd = np.array([r[:5].tolist() for r in  # TODO: Confidence threshold
+                               results.boxes.data.cpu().numpy() if r[4] > 0.35]).reshape((-1, 5))
+        self.frame_counter += 1
+        frame_keypoints = {}
+        scores_bbox = {}
+        ids = None
+        if self.tracker is not None:
+            res_pd = self.tracker.update(res_pd)
+            ids = res_pd[:, 5].astype(int).tolist()
+        # Prepare boxes for inference
+        bboxes = res_pd[:, :4].round().astype(int)
+        scores = res_pd[:, 4].tolist()
+        pad_bbox = 10
+        if ids is None:
+            ids = range(len(bboxes))
+        for bbox, id, score in zip(bboxes, ids, scores):
+            # TODO: Slightly bigger bbox
+            bbox[[0, 2]] = np.clip(bbox[[0, 2]] + [-pad_bbox, pad_bbox], 0, img.shape[1])
+            bbox[[1, 3]] = np.clip(bbox[[1, 3]] + [-pad_bbox, pad_bbox], 0, img.shape[0])
+            # Crop image and pad to 3/4 aspect ratio
+            img_inf = img[bbox[1]:bbox[3], bbox[0]:bbox[2]]
+            img_inf, (left_pad, top_pad) = pad_image(img_inf, 3 / 4)
+            keypoints = self._inference(img_inf)[0]
+            # Transform keypoints to original image
+            keypoints[:, :2] += bbox[:2][::-1] - [top_pad, left_pad]
+            frame_keypoints[id] = keypoints
+            scores_bbox[id] = score  # Replace this with avg_keypoint_conf*person_obj_conf. For now, only person_obj_conf from yolo is being used.
+        if self.save_state:
+            self._img = img
+            self._yolo_res = results
+            self._tracker_res = (bboxes, ids, scores)
+            self._keypoints = frame_keypoints
+            self._scores_bbox = scores_bbox
+        return frame_keypoints
+    def draw(self, show_yolo=True, show_raw_yolo=False, confidence_threshold=0.5):
+        """
+        Draw keypoints and bounding boxes on the image.
+        Args:
+            show_yolo (bool, optional): Whether to show YOLOv8 bounding boxes. Default is True.
+            show_raw_yolo (bool, optional): Whether to show raw YOLOv8 bounding boxes. Default is False.
+        Returns:
+            ndarray: Image with keypoints and bounding boxes drawn.
+        """
+        img = self._img.copy()
+        bboxes, ids, scores = self._tracker_res
+        if self._yolo_res is not None and (show_raw_yolo or (self.tracker is None and show_yolo)):
+            img = np.array(self._yolo_res.plot())[..., ::-1]
+        if show_yolo and self.tracker is not None:
+            img = draw_bboxes(img, bboxes, ids, scores)
+        img = np.array(img)[..., ::-1]  # RGB to BGR for cv2 modules
+        for idx, k in self._keypoints.items():
+            img = draw_points_and_skeleton(img.copy(), k,
+                                           joints_dict()[self.dataset]['skeleton'],
+                                           person_index=idx,
+                                           points_color_palette='gist_rainbow',
+                                           skeleton_color_palette='jet',
+                                           points_palette_samples=10,
+                                           confidence_threshold=confidence_threshold)
+        return img[..., ::-1]  # Return RGB as original
+    def pre_img(self, img):
+        org_h, org_w = img.shape[:2]
+        img_input = cv2.resize(img, self.target_size, interpolation=cv2.INTER_LINEAR) / 255
+        img_input = ((img_input - MEAN) / STD).transpose(2, 0, 1)[None].astype(np.float32)
+        return img_input, org_h, org_w
+    @torch.no_grad()
+    def _inference_torch(self, img: np.ndarray) -> np.ndarray:
+        # Prepare input data
+        img_input, org_h, org_w = self.pre_img(img)
+        img_input = torch.from_numpy(img_input).to(torch.device(self.device))
+        # Feed to model
+        heatmaps = self._vit_pose(img_input).detach().cpu().numpy()
+        return self.postprocess(heatmaps, org_w, org_h)
+    def _inference_onnx(self, img: np.ndarray) -> np.ndarray:
+        # Prepare input data
+        img_input, org_h, org_w = self.pre_img(img)
+        # Feed to model
+        ort_inputs = {self._ort_session.get_inputs()[0].name: img_input}
+        heatmaps = self._ort_session.run(None, ort_inputs)[0]
+        return self.postprocess(heatmaps, org_w, org_h)

ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/testVITPOSE-checkpoint.jpg ADDED Viewed

ViTPose/easy_ViTPose/easy_ViTPose/.ipynb_checkpoints/train-checkpoint.py ADDED Viewed

	@@ -0,0 +1,174 @@

+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import copy
+import os
+import os.path as osp
+import time
+import warnings
+import click
+import yaml
+from glob import glob
+import torch
+import torch.distributed as dist
+from vit_utils.util import init_random_seed, set_random_seed
+from vit_utils.dist_util import get_dist_info, init_dist
+from vit_utils.logging import get_root_logger
+import configs.ViTPose_small_coco_256x192 as s_cfg
+import configs.ViTPose_base_coco_256x192 as b_cfg
+import configs.ViTPose_large_coco_256x192 as l_cfg
+import configs.ViTPose_huge_coco_256x192 as h_cfg
+from vit_models.model import ViTPose
+from datasets.COCO import COCODataset
+from vit_utils.train_valid_fn import train_model
+CUR_PATH = osp.dirname(__file__)
+@click.command()
+@click.option('--config-path', type=click.Path(exists=True), default='config.yaml', required=True, help='train config file path')
+@click.option('--model-name', type=str, default='b', required=True, help='[b: ViT-B, l: ViT-L, h: ViT-H]')
+def main(config_path, model_name):
+    cfg = {'b':b_cfg,
+           's':s_cfg,
+           'l':l_cfg,
+           'h':h_cfg}.get(model_name.lower())
+    # Load config.yaml
+    with open(config_path, 'r') as f:
+        cfg_yaml = yaml.load(f, Loader=yaml.SafeLoader)
+    for k, v in cfg_yaml.items():
+        if hasattr(cfg, k):
+            raise ValueError(f"Already exists {k} in config")
+        else:
+            cfg.__setattr__(k, v)
+    # set cudnn_benchmark
+    if cfg.cudnn_benchmark:
+        torch.backends.cudnn.benchmark = True
+    # Set work directory (session-level)
+    if not hasattr(cfg, 'work_dir'):
+        cfg.__setattr__('work_dir', f"{CUR_PATH}/runs/train")
+    if not osp.exists(cfg.work_dir):
+        os.makedirs(cfg.work_dir)
+    session_list = sorted(glob(f"{cfg.work_dir}/*"))
+    if len(session_list) == 0:
+        session = 1
+    else:
+        session = int(os.path.basename(session_list[-1])) + 1
+    session_dir = osp.join(cfg.work_dir, str(session).zfill(3))
+    os.makedirs(session_dir)
+    cfg.__setattr__('work_dir', session_dir)
+    if cfg.autoscale_lr:
+        # apply the linear scaling rule (https://arxiv.org/abs/1706.02677)
+        cfg.optimizer['lr'] = cfg.optimizer['lr'] * len(cfg.gpu_ids) / 8
+    # init distributed env first, since logger depends on the dist info.
+    if cfg.launcher == 'none':
+        distributed = False
+        if len(cfg.gpu_ids) > 1:
+            warnings.warn(
+                f"We treat {cfg['gpu_ids']} as gpu-ids, and reset to "
+                f"{cfg['gpu_ids'][0:1]} as gpu-ids to avoid potential error in "
+                "non-distribute training time.")
+            cfg.gpu_ids = cfg.gpu_ids[0:1]
+    else:
+        distributed = True
+        init_dist(cfg.launcher, **cfg.dist_params)
+        # re-set gpu_ids with distributed training mode
+        _, world_size = get_dist_info()
+        cfg.gpu_ids = range(world_size)
+    # init the logger before other steps
+    timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
+    log_file = osp.join(session_dir, f'{timestamp}.log')
+    logger = get_root_logger(log_file=log_file)
+    # init the meta dict to record some important information such as
+    # environment info and seed, which will be logged
+    meta = dict()
+    # log some basic info
+    logger.info(f'Distributed training: {distributed}')
+    # set random seeds
+    seed = init_random_seed(cfg.seed)
+    logger.info(f"Set random seed to {seed}, "
+                f"deterministic: {cfg.deterministic}")
+    set_random_seed(seed, deterministic=cfg.deterministic)
+    meta['seed'] = seed
+    # Set model
+    model = ViTPose(cfg.model)
+    if cfg.resume_from:
+        # Load ckpt partially
+        ckpt_state = torch.load(cfg.resume_from)['state_dict']
+        ckpt_state.pop('keypoint_head.final_layer.bias')
+        ckpt_state.pop('keypoint_head.final_layer.weight')
+        model.load_state_dict(ckpt_state, strict=False)
+        # freeze the backbone, leave the head to be finetuned
+        model.backbone.frozen_stages = model.backbone.depth - 1
+        model.backbone.freeze_ffn = True
+        model.backbone.freeze_attn = True
+        model.backbone._freeze_stages()
+    # Set dataset
+    datasets_train = COCODataset(
+        root_path=cfg.data_root,
+        data_version="feet_train",
+        is_train=True,
+        use_gt_bboxes=True,
+        image_width=192,
+        image_height=256,
+        scale=True,
+        scale_factor=0.35,
+        flip_prob=0.5,
+        rotate_prob=0.5,
+        rotation_factor=45.,
+        half_body_prob=0.3,
+        use_different_joints_weight=True,
+        heatmap_sigma=3,
+        soft_nms=False
+        )
+    datasets_valid = COCODataset(
+        root_path=cfg.data_root,
+        data_version="feet_val",
+        is_train=False,
+        use_gt_bboxes=True,
+        image_width=192,
+        image_height=256,
+        scale=False,
+        scale_factor=0.35,
+        flip_prob=0.5,
+        rotate_prob=0.5,
+        rotation_factor=45.,
+        half_body_prob=0.3,
+        use_different_joints_weight=True,
+        heatmap_sigma=3,
+        soft_nms=False
+        )
+    train_model(
+        model=model,
+        datasets_train=datasets_train,
+        datasets_valid=datasets_valid,
+        cfg=cfg,
+        distributed=distributed,
+        validate=cfg.validate,
+        timestamp=timestamp,
+        meta=meta
+        )
+if __name__ == '__main__':
+    main()

ViTPose/easy_ViTPose/easy_ViTPose/ViTPose_Inference.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

ViTPose/easy_ViTPose/easy_ViTPose/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+from .inference import VitInference
+__all__ = [
+    'VitInference'
+]

ViTPose/easy_ViTPose/easy_ViTPose/config.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+# Train config ---------------------------------------
+log_level: logging.INFO
+seed: 0
+gpu_ids: 0
+deterministic: True
+cudnn_benchmark: True # Use cudnn
+resume_from: "C:/Users/user/ViTPose/ckpts/vitpose-s-coco_25.pth" # CKPT path
+#resume_from: False
+gpu_ids: [0]
+launcher: 'none' # When distributed training ['none', 'pytorch', 'slurm', 'mpi']
+use_amp: False
+validate: True
+autoscale_lr: False
+dist_params:
+  ...

ViTPose/easy_ViTPose/easy_ViTPose/configs/.ipynb_checkpoints/ViTPose_common-checkpoint.py ADDED Viewed

	@@ -0,0 +1,195 @@

+# Common configuration
+optimizer = dict(type='AdamW', lr=1e-3, betas=(0.9, 0.999), weight_decay=0.1,
+                 constructor='LayerDecayOptimizerConstructor',
+                 paramwise_cfg=dict(
+                                    num_layers=12,
+                                    layer_decay_rate=1 - 2e-4,
+                                    custom_keys={
+                                            'bias': dict(decay_multi=0.),
+                                            'pos_embed': dict(decay_mult=0.),
+                                            'relative_position_bias_table': dict(decay_mult=0.),
+                                            'norm': dict(decay_mult=0.)
+                                            }
+                                    )
+                )
+optimizer_config = dict(grad_clip=dict(max_norm=1., norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=300,
+    warmup_ratio=0.001,
+    step=[3])
+total_epochs = 4
+target_type = 'GaussianHeatmap'
+data_cfg = dict(
+    image_size=[192, 256],
+    heatmap_size=[48, 64],
+    soft_nms=False,
+    nms_thr=1.0,
+    oks_thr=0.9,
+    vis_thr=0.2,
+    use_gt_bbox=False,
+    det_bbox_thr=0.0,
+    bbox_file='data/coco/person_detection_results/'
+    'COCO_val2017_detections_AP_H_56_person.json',
+)
+data_root = '/home/adryw/dataset/COCO17'
+data = dict(
+    samples_per_gpu=64,
+    workers_per_gpu=6,
+    val_dataloader=dict(samples_per_gpu=128),
+    test_dataloader=dict(samples_per_gpu=128),
+    train=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
+        img_prefix=f'{data_root}/train2017/',
+        data_cfg=data_cfg),
+    val=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
+        img_prefix=f'{data_root}/val2017/',
+        data_cfg=data_cfg),
+    test=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
+        img_prefix=f'{data_root}/val2017/',
+        data_cfg=data_cfg)
+)
+model_small = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=384,
+        depth=12,
+        num_heads=12,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.1,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=384,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))
+model_base = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=768,
+        depth=12,
+        num_heads=12,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.3,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=768,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))
+model_large = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=1024,
+        depth=24,
+        num_heads=16,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.5,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=1024,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))
+model_huge = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=1280,
+        depth=32,
+        num_heads=16,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.55,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=1280,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))

ViTPose/easy_ViTPose/easy_ViTPose/configs/.ipynb_checkpoints/ViTPose_small_coco_256x192-checkpoint.py ADDED Viewed

	@@ -0,0 +1,173 @@

+_base_ = [
+    '../../../../_base_/default_runtime.py',
+    '../../../../_base_/datasets/coco.py'
+]
+evaluation = dict(interval=10, metric='mAP', save_best='AP')
+optimizer = dict(type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1,
+                 constructor='LayerDecayOptimizerConstructor',
+                 paramwise_cfg=dict(
+                                    num_layers=12,
+                                    layer_decay_rate=0.8,
+                                    custom_keys={
+                                            'bias': dict(decay_multi=0.),
+                                            'pos_embed': dict(decay_mult=0.),
+                                            'relative_position_bias_table': dict(decay_mult=0.),
+                                            'norm': dict(decay_mult=0.)
+                                            }
+                                    )
+                )
+optimizer_config = dict(grad_clip=dict(max_norm=1., norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=500,
+    warmup_ratio=0.001,
+    step=[170, 200])
+total_epochs = 210
+target_type = 'GaussianHeatmap'
+channel_cfg = dict(
+    num_output_channels=17,
+    dataset_joints=17,
+    dataset_channel=[
+        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
+    ],
+    inference_channel=[
+        0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
+    ])
+# model settings
+model = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=384,
+        depth=12,
+        num_heads=12,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.1,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=384,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        out_channels=channel_cfg['num_output_channels'],
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))
+data_cfg = dict(
+    image_size=[192, 256],
+    heatmap_size=[48, 64],
+    num_output_channels=channel_cfg['num_output_channels'],
+    num_joints=channel_cfg['dataset_joints'],
+    dataset_channel=channel_cfg['dataset_channel'],
+    inference_channel=channel_cfg['inference_channel'],
+    soft_nms=False,
+    nms_thr=1.0,
+    oks_thr=0.9,
+    vis_thr=0.9,
+    use_gt_bbox=False,
+    det_bbox_thr=0.0,
+    bbox_file='data/coco/person_detection_results/'
+    'COCO_val2017_detections_AP_H_56_person.json',
+)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='TopDownRandomFlip', flip_prob=0.5),
+    dict(
+        type='TopDownHalfBodyTransform',
+        num_joints_half_body=8,
+        prob_half_body=0.3),
+    dict(
+        type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
+    dict(type='TopDownAffine', use_udp=True),
+    dict(type='ToTensor'),
+    dict(
+        type='NormalizeTensor',
+        mean=[0.485, 0.456, 0.406],
+        std=[0.229, 0.224, 0.225]),
+    dict(
+        type='TopDownGenerateTarget',
+        sigma=2,
+        encoding='UDP',
+        target_type=target_type),
+    dict(
+        type='Collect',
+        keys=['img', 'target', 'target_weight'],
+        meta_keys=[
+            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
+            'rotation', 'bbox_score', 'flip_pairs'
+        ]),
+]
+val_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='TopDownAffine', use_udp=True),
+    dict(type='ToTensor'),
+    dict(
+        type='NormalizeTensor',
+        mean=[0.485, 0.456, 0.406],
+        std=[0.229, 0.224, 0.225]),
+    dict(
+        type='Collect',
+        keys=['img'],
+        meta_keys=[
+            'image_file', 'center', 'scale', 'rotation', 'bbox_score',
+            'flip_pairs'
+        ]),
+]
+test_pipeline = val_pipeline
+data_root = r'D:\ViTPose\Evaluating'
+data = dict(
+    samples_per_gpu=4,
+    workers_per_gpu=4,
+    val_dataloader=dict(samples_per_gpu=4),
+    test_dataloader=dict(samples_per_gpu=4),
+    train=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
+        img_prefix=f'{data_root}/train2017/',
+        data_cfg=data_cfg,
+        pipeline=train_pipeline,
+    #    dataset_info={{_base_.dataset_info}}
+    ),
+    val=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
+        img_prefix=f'{data_root}/val2017/',
+        data_cfg=data_cfg,
+        pipeline=val_pipeline,
+    #  dataset_info={{_base_.dataset_info}}
+    ),
+    test=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
+        img_prefix=f'{data_root}/val2017/',
+        data_cfg=data_cfg,
+        pipeline=test_pipeline,
+        #dataset_info={{_base_.dataset_info}}
+    ),
+)

ViTPose/easy_ViTPose/easy_ViTPose/configs/.ipynb_checkpoints/ViTPose_wholebody-checkpoint.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from .ViTPose_common import *
+# Channel configuration
+channel_cfg = dict(
+    num_output_channels=133,
+    dataset_joints=133,
+    dataset_channel=[
+        list(range(133)),
+    ],
+    inference_channel=list(range(133)))
+# Set models channels
+data_cfg['num_output_channels'] = channel_cfg['num_output_channels']
+data_cfg['num_joints']= channel_cfg['dataset_joints']
+data_cfg['dataset_channel']= channel_cfg['dataset_channel']
+data_cfg['inference_channel']= channel_cfg['inference_channel']
+names = ['small', 'base', 'large', 'huge']
+for name in names:
+    globals()[f'model_{name}']['keypoint_head']['out_channels'] = channel_cfg['num_output_channels']

ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_aic.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from .ViTPose_common import *
+# Channel configuration
+channel_cfg = dict(
+    num_output_channels=14,
+    dataset_joints=14,
+    dataset_channel=[
+        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
+    ],
+    inference_channel=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
+# Set models channels
+data_cfg['num_output_channels'] = channel_cfg['num_output_channels']
+data_cfg['num_joints']= channel_cfg['dataset_joints']
+data_cfg['dataset_channel']= channel_cfg['dataset_channel']
+data_cfg['inference_channel']= channel_cfg['inference_channel']
+names = ['small', 'base', 'large', 'huge']
+for name in names:
+    globals()[f'model_{name}']['keypoint_head']['out_channels'] = channel_cfg['num_output_channels']

ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_ap10k.py ADDED Viewed

	@@ -0,0 +1,22 @@

+from .ViTPose_common import *
+# Channel configuration
+channel_cfg = dict(
+    num_output_channels=17,
+    dataset_joints=17,
+    dataset_channel=[
+        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
+    ],
+    inference_channel=[
+        0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
+    ])
+# Set models channels
+data_cfg['num_output_channels'] = channel_cfg['num_output_channels']
+data_cfg['num_joints']= channel_cfg['dataset_joints']
+data_cfg['dataset_channel']= channel_cfg['dataset_channel']
+data_cfg['inference_channel']= channel_cfg['inference_channel']
+names = ['small', 'base', 'large', 'huge']
+for name in names:
+    globals()[f'model_{name}']['keypoint_head']['out_channels'] = channel_cfg['num_output_channels']

ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_apt36k.py ADDED Viewed

	@@ -0,0 +1,22 @@

+from .ViTPose_common import *
+# Channel configuration
+channel_cfg = dict(
+    num_output_channels=17,
+    dataset_joints=17,
+    dataset_channel=[
+        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
+    ],
+    inference_channel=[
+        0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
+    ])
+# Set models channels
+data_cfg['num_output_channels'] = channel_cfg['num_output_channels']
+data_cfg['num_joints']= channel_cfg['dataset_joints']
+data_cfg['dataset_channel']= channel_cfg['dataset_channel']
+data_cfg['inference_channel']= channel_cfg['inference_channel']
+names = ['small', 'base', 'large', 'huge']
+for name in names:
+    globals()[f'model_{name}']['keypoint_head']['out_channels'] = channel_cfg['num_output_channels']

ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_coco.py ADDED Viewed

	@@ -0,0 +1,18 @@

+from .ViTPose_common import *
+# Channel configuration
+channel_cfg = dict(
+    num_output_channels=17,
+    dataset_joints=17,
+    dataset_channel=list(range(17)),
+    inference_channel=list(range(17)))
+# Set models channels
+data_cfg['num_output_channels'] = channel_cfg['num_output_channels']
+data_cfg['num_joints']= channel_cfg['dataset_joints']
+data_cfg['dataset_channel']= channel_cfg['dataset_channel']
+data_cfg['inference_channel']= channel_cfg['inference_channel']
+names = ['small', 'base', 'large', 'huge']
+for name in names:
+    globals()[f'model_{name}']['keypoint_head']['out_channels'] = channel_cfg['num_output_channels']

ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_coco_25.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from .ViTPose_common import *
+# Channel configuration
+channel_cfg = dict(
+    num_output_channels=25,
+    dataset_joints=25,
+    dataset_channel=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+                      16, 17, 18, 19, 20, 21, 22, 23, 24], ],
+    inference_channel=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+                       16, 17, 18, 19, 20, 21, 22, 23, 24])
+# Set models channels
+data_cfg['num_output_channels'] = channel_cfg['num_output_channels']
+data_cfg['num_joints']= channel_cfg['dataset_joints']
+data_cfg['dataset_channel']= channel_cfg['dataset_channel']
+data_cfg['inference_channel']= channel_cfg['inference_channel']
+names = ['small', 'base', 'large', 'huge']
+for name in names:
+    globals()[f'model_{name}']['keypoint_head']['out_channels'] = channel_cfg['num_output_channels']

ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_common.py ADDED Viewed

	@@ -0,0 +1,195 @@

+# Common configuration
+optimizer = dict(type='AdamW', lr=1e-3, betas=(0.9, 0.999), weight_decay=0.1,
+                 constructor='LayerDecayOptimizerConstructor',
+                 paramwise_cfg=dict(
+                                    num_layers=12,
+                                    layer_decay_rate=1 - 2e-4,
+                                    custom_keys={
+                                            'bias': dict(decay_multi=0.),
+                                            'pos_embed': dict(decay_mult=0.),
+                                            'relative_position_bias_table': dict(decay_mult=0.),
+                                            'norm': dict(decay_mult=0.)
+                                            }
+                                    )
+                )
+optimizer_config = dict(grad_clip=dict(max_norm=1., norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=300,
+    warmup_ratio=0.001,
+    step=[3])
+total_epochs = 4
+target_type = 'GaussianHeatmap'
+data_cfg = dict(
+    image_size=[192, 256],
+    heatmap_size=[48, 64],
+    soft_nms=False,
+    nms_thr=1.0,
+    oks_thr=0.9,
+    vis_thr=0.2,
+    use_gt_bbox=False,
+    det_bbox_thr=0.0,
+    bbox_file='data/coco/person_detection_results/'
+    'COCO_val2017_detections_AP_H_56_person.json',
+)
+data_root = '/home/adryw/dataset/COCO17'
+data = dict(
+    samples_per_gpu=64,
+    workers_per_gpu=6,
+    val_dataloader=dict(samples_per_gpu=128),
+    test_dataloader=dict(samples_per_gpu=128),
+    train=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
+        img_prefix=f'{data_root}/train2017/',
+        data_cfg=data_cfg),
+    val=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
+        img_prefix=f'{data_root}/val2017/',
+        data_cfg=data_cfg),
+    test=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_val2017.json',
+        img_prefix=f'{data_root}/val2017/',
+        data_cfg=data_cfg)
+)
+model_small = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=384,
+        depth=12,
+        num_heads=12,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.1,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=384,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))
+model_base = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=768,
+        depth=12,
+        num_heads=12,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.3,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=768,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))
+model_large = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=1024,
+        depth=24,
+        num_heads=16,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.5,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=1024,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))
+model_huge = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=1280,
+        depth=32,
+        num_heads=16,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.55,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=1280,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))

ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_mpii.py ADDED Viewed

	@@ -0,0 +1,18 @@

+from .ViTPose_common import *
+# Channel configuration
+channel_cfg = dict(
+    num_output_channels=16,
+    dataset_joints=16,
+    dataset_channel=list(range(16)),
+    inference_channel=list(range(16)))
+# Set models channels
+data_cfg['num_output_channels'] = channel_cfg['num_output_channels']
+data_cfg['num_joints']= channel_cfg['dataset_joints']
+data_cfg['dataset_channel']= channel_cfg['dataset_channel']
+data_cfg['inference_channel']= channel_cfg['inference_channel']
+names = ['small', 'base', 'large', 'huge']
+for name in names:
+    globals()[f'model_{name}']['keypoint_head']['out_channels'] = channel_cfg['num_output_channels']

ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_small_coco_256x192.py ADDED Viewed

	@@ -0,0 +1,173 @@

+_base_ = [
+    '../../../../_base_/default_runtime.py',
+    '../../../../_base_/datasets/coco.py'
+]
+evaluation = dict(interval=10, metric='mAP', save_best='AP')
+optimizer = dict(type='AdamW', lr=5e-4, betas=(0.9, 0.999), weight_decay=0.1,
+                 constructor='LayerDecayOptimizerConstructor',
+                 paramwise_cfg=dict(
+                                    num_layers=12,
+                                    layer_decay_rate=0.8,
+                                    custom_keys={
+                                            'bias': dict(decay_multi=0.),
+                                            'pos_embed': dict(decay_mult=0.),
+                                            'relative_position_bias_table': dict(decay_mult=0.),
+                                            'norm': dict(decay_mult=0.)
+                                            }
+                                    )
+                )
+optimizer_config = dict(grad_clip=dict(max_norm=1., norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=500,
+    warmup_ratio=0.001,
+    step=[170, 200])
+total_epochs = 210
+target_type = 'GaussianHeatmap'
+channel_cfg = dict(
+    num_output_channels=17,
+    dataset_joints=17,
+    dataset_channel=[
+        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
+    ],
+    inference_channel=[
+        0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
+    ])
+# model settings
+model = dict(
+    type='TopDown',
+    pretrained=None,
+    backbone=dict(
+        type='ViT',
+        img_size=(256, 192),
+        patch_size=16,
+        embed_dim=384,
+        depth=12,
+        num_heads=12,
+        ratio=1,
+        use_checkpoint=False,
+        mlp_ratio=4,
+        qkv_bias=True,
+        drop_path_rate=0.1,
+    ),
+    keypoint_head=dict(
+        type='TopdownHeatmapSimpleHead',
+        in_channels=384,
+        num_deconv_layers=2,
+        num_deconv_filters=(256, 256),
+        num_deconv_kernels=(4, 4),
+        extra=dict(final_conv_kernel=1, ),
+        out_channels=channel_cfg['num_output_channels'],
+        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
+    train_cfg=dict(),
+    test_cfg=dict(
+        flip_test=True,
+        post_process='default',
+        shift_heatmap=False,
+        target_type=target_type,
+        modulate_kernel=11,
+        use_udp=True))
+data_cfg = dict(
+    image_size=[192, 256],
+    heatmap_size=[48, 64],
+    num_output_channels=channel_cfg['num_output_channels'],
+    num_joints=channel_cfg['dataset_joints'],
+    dataset_channel=channel_cfg['dataset_channel'],
+    inference_channel=channel_cfg['inference_channel'],
+    soft_nms=False,
+    nms_thr=1.0,
+    oks_thr=0.9,
+    vis_thr=0.9,
+    use_gt_bbox=False,
+    det_bbox_thr=0.0,
+    bbox_file='data/coco/person_detection_results/'
+    'COCO_val2017_detections_AP_H_56_person.json',
+)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='TopDownRandomFlip', flip_prob=0.5),
+    dict(
+        type='TopDownHalfBodyTransform',
+        num_joints_half_body=8,
+        prob_half_body=0.3),
+    dict(
+        type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
+    dict(type='TopDownAffine', use_udp=True),
+    dict(type='ToTensor'),
+    dict(
+        type='NormalizeTensor',
+        mean=[0.485, 0.456, 0.406],
+        std=[0.229, 0.224, 0.225]),
+    dict(
+        type='TopDownGenerateTarget',
+        sigma=2,
+        encoding='UDP',
+        target_type=target_type),
+    dict(
+        type='Collect',
+        keys=['img', 'target', 'target_weight'],
+        meta_keys=[
+            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
+            'rotation', 'bbox_score', 'flip_pairs'
+        ]),
+]
+val_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='TopDownAffine', use_udp=True),
+    dict(type='ToTensor'),
+    dict(
+        type='NormalizeTensor',
+        mean=[0.485, 0.456, 0.406],
+        std=[0.229, 0.224, 0.225]),
+    dict(
+        type='Collect',
+        keys=['img'],
+        meta_keys=[
+            'image_file', 'center', 'scale', 'rotation', 'bbox_score',
+            'flip_pairs'
+        ]),
+]
+test_pipeline = val_pipeline
+data_root = r'D:\ViTPose\Evaluating'
+data = dict(
+    samples_per_gpu=4,
+    workers_per_gpu=4,
+    val_dataloader=dict(samples_per_gpu=4),
+    test_dataloader=dict(samples_per_gpu=4),
+    train=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
+        img_prefix=f'{data_root}/train2017/',
+        data_cfg=data_cfg,
+        pipeline=train_pipeline,
+    #    dataset_info={{_base_.dataset_info}}
+    ),
+    val=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
+        img_prefix=f'{data_root}/val2017/',
+        data_cfg=data_cfg,
+        pipeline=val_pipeline,
+    #  dataset_info={{_base_.dataset_info}}
+    ),
+    test=dict(
+        type='TopDownCocoDataset',
+        ann_file=f'{data_root}/annotations/person_keypoints_train2017.json',
+        img_prefix=f'{data_root}/val2017/',
+        data_cfg=data_cfg,
+        pipeline=test_pipeline,
+        #dataset_info={{_base_.dataset_info}}
+    ),
+)

ViTPose/easy_ViTPose/easy_ViTPose/configs/ViTPose_wholebody.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from .ViTPose_common import *
+# Channel configuration
+channel_cfg = dict(
+    num_output_channels=133,
+    dataset_joints=133,
+    dataset_channel=[
+        list(range(133)),
+    ],
+    inference_channel=list(range(133)))
+# Set models channels
+data_cfg['num_output_channels'] = channel_cfg['num_output_channels']
+data_cfg['num_joints']= channel_cfg['dataset_joints']
+data_cfg['dataset_channel']= channel_cfg['dataset_channel']
+data_cfg['inference_channel']= channel_cfg['inference_channel']
+names = ['small', 'base', 'large', 'huge']
+for name in names:
+    globals()[f'model_{name}']['keypoint_head']['out_channels'] = channel_cfg['num_output_channels']

ViTPose/easy_ViTPose/easy_ViTPose/configs/__init__.py ADDED Viewed

File without changes

ViTPose/easy_ViTPose/easy_ViTPose/configs/__pycache__/ViTPose_coco_25.cpython-39.pyc ADDED Viewed

Binary file (697 Bytes). View file

ViTPose/easy_ViTPose/easy_ViTPose/configs/__pycache__/ViTPose_common.cpython-39.pyc ADDED Viewed

Binary file (2.88 kB). View file

ViTPose/easy_ViTPose/easy_ViTPose/configs/__pycache__/ViTPose_small_coco_256x192.cpython-39.pyc ADDED Viewed

Binary file (3.69 kB). View file

ViTPose/easy_ViTPose/easy_ViTPose/configs/__pycache__/__init__.cpython-39.pyc ADDED Viewed

Binary file (158 Bytes). View file

ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/300w.py ADDED Viewed

	@@ -0,0 +1,384 @@

+dataset_info = dict(
+    dataset_name='300w',
+    paper_info=dict(
+        author='Sagonas, Christos and Antonakos, Epameinondas '
+        'and Tzimiropoulos, Georgios and Zafeiriou, Stefanos '
+        'and Pantic, Maja',
+        title='300 faces in-the-wild challenge: '
+        'Database and results',
+        container='Image and vision computing',
+        year='2016',
+        homepage='https://ibug.doc.ic.ac.uk/resources/300-W/',
+    ),
+    keypoint_info={
+        0:
+        dict(
+            name='kpt-0', id=0, color=[255, 255, 255], type='', swap='kpt-16'),
+        1:
+        dict(
+            name='kpt-1', id=1, color=[255, 255, 255], type='', swap='kpt-15'),
+        2:
+        dict(
+            name='kpt-2', id=2, color=[255, 255, 255], type='', swap='kpt-14'),
+        3:
+        dict(
+            name='kpt-3', id=3, color=[255, 255, 255], type='', swap='kpt-13'),
+        4:
+        dict(
+            name='kpt-4', id=4, color=[255, 255, 255], type='', swap='kpt-12'),
+        5:
+        dict(
+            name='kpt-5', id=5, color=[255, 255, 255], type='', swap='kpt-11'),
+        6:
+        dict(
+            name='kpt-6', id=6, color=[255, 255, 255], type='', swap='kpt-10'),
+        7:
+        dict(name='kpt-7', id=7, color=[255, 255, 255], type='', swap='kpt-9'),
+        8:
+        dict(name='kpt-8', id=8, color=[255, 255, 255], type='', swap=''),
+        9:
+        dict(name='kpt-9', id=9, color=[255, 255, 255], type='', swap='kpt-7'),
+        10:
+        dict(
+            name='kpt-10', id=10, color=[255, 255, 255], type='',
+            swap='kpt-6'),
+        11:
+        dict(
+            name='kpt-11', id=11, color=[255, 255, 255], type='',
+            swap='kpt-5'),
+        12:
+        dict(
+            name='kpt-12', id=12, color=[255, 255, 255], type='',
+            swap='kpt-4'),
+        13:
+        dict(
+            name='kpt-13', id=13, color=[255, 255, 255], type='',
+            swap='kpt-3'),
+        14:
+        dict(
+            name='kpt-14', id=14, color=[255, 255, 255], type='',
+            swap='kpt-2'),
+        15:
+        dict(
+            name='kpt-15', id=15, color=[255, 255, 255], type='',
+            swap='kpt-1'),
+        16:
+        dict(
+            name='kpt-16', id=16, color=[255, 255, 255], type='',
+            swap='kpt-0'),
+        17:
+        dict(
+            name='kpt-17',
+            id=17,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-26'),
+        18:
+        dict(
+            name='kpt-18',
+            id=18,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-25'),
+        19:
+        dict(
+            name='kpt-19',
+            id=19,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-24'),
+        20:
+        dict(
+            name='kpt-20',
+            id=20,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-23'),
+        21:
+        dict(
+            name='kpt-21',
+            id=21,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-22'),
+        22:
+        dict(
+            name='kpt-22',
+            id=22,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-21'),
+        23:
+        dict(
+            name='kpt-23',
+            id=23,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-20'),
+        24:
+        dict(
+            name='kpt-24',
+            id=24,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-19'),
+        25:
+        dict(
+            name='kpt-25',
+            id=25,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-18'),
+        26:
+        dict(
+            name='kpt-26',
+            id=26,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-17'),
+        27:
+        dict(name='kpt-27', id=27, color=[255, 255, 255], type='', swap=''),
+        28:
+        dict(name='kpt-28', id=28, color=[255, 255, 255], type='', swap=''),
+        29:
+        dict(name='kpt-29', id=29, color=[255, 255, 255], type='', swap=''),
+        30:
+        dict(name='kpt-30', id=30, color=[255, 255, 255], type='', swap=''),
+        31:
+        dict(
+            name='kpt-31',
+            id=31,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-35'),
+        32:
+        dict(
+            name='kpt-32',
+            id=32,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-34'),
+        33:
+        dict(name='kpt-33', id=33, color=[255, 255, 255], type='', swap=''),
+        34:
+        dict(
+            name='kpt-34',
+            id=34,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-32'),
+        35:
+        dict(
+            name='kpt-35',
+            id=35,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-31'),
+        36:
+        dict(
+            name='kpt-36',
+            id=36,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-45'),
+        37:
+        dict(
+            name='kpt-37',
+            id=37,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-44'),
+        38:
+        dict(
+            name='kpt-38',
+            id=38,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-43'),
+        39:
+        dict(
+            name='kpt-39',
+            id=39,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-42'),
+        40:
+        dict(
+            name='kpt-40',
+            id=40,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-47'),
+        41:
+        dict(
+            name='kpt-41',
+            id=41,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-46'),
+        42:
+        dict(
+            name='kpt-42',
+            id=42,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-39'),
+        43:
+        dict(
+            name='kpt-43',
+            id=43,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-38'),
+        44:
+        dict(
+            name='kpt-44',
+            id=44,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-37'),
+        45:
+        dict(
+            name='kpt-45',
+            id=45,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-36'),
+        46:
+        dict(
+            name='kpt-46',
+            id=46,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-41'),
+        47:
+        dict(
+            name='kpt-47',
+            id=47,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-40'),
+        48:
+        dict(
+            name='kpt-48',
+            id=48,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-54'),
+        49:
+        dict(
+            name='kpt-49',
+            id=49,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-53'),
+        50:
+        dict(
+            name='kpt-50',
+            id=50,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-52'),
+        51:
+        dict(name='kpt-51', id=51, color=[255, 255, 255], type='', swap=''),
+        52:
+        dict(
+            name='kpt-52',
+            id=52,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-50'),
+        53:
+        dict(
+            name='kpt-53',
+            id=53,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-49'),
+        54:
+        dict(
+            name='kpt-54',
+            id=54,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-48'),
+        55:
+        dict(
+            name='kpt-55',
+            id=55,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-59'),
+        56:
+        dict(
+            name='kpt-56',
+            id=56,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-58'),
+        57:
+        dict(name='kpt-57', id=57, color=[255, 255, 255], type='', swap=''),
+        58:
+        dict(
+            name='kpt-58',
+            id=58,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-56'),
+        59:
+        dict(
+            name='kpt-59',
+            id=59,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-55'),
+        60:
+        dict(
+            name='kpt-60',
+            id=60,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-64'),
+        61:
+        dict(
+            name='kpt-61',
+            id=61,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-63'),
+        62:
+        dict(name='kpt-62', id=62, color=[255, 255, 255], type='', swap=''),
+        63:
+        dict(
+            name='kpt-63',
+            id=63,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-61'),
+        64:
+        dict(
+            name='kpt-64',
+            id=64,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-60'),
+        65:
+        dict(
+            name='kpt-65',
+            id=65,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-67'),
+        66:
+        dict(name='kpt-66', id=66, color=[255, 255, 255], type='', swap=''),
+        67:
+        dict(
+            name='kpt-67',
+            id=67,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-65'),
+    },
+    skeleton_info={},
+    joint_weights=[1.] * 68,
+    sigmas=[])

ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/aflw.py ADDED Viewed

	@@ -0,0 +1,83 @@

+dataset_info = dict(
+    dataset_name='aflw',
+    paper_info=dict(
+        author='Koestinger, Martin and Wohlhart, Paul and '
+        'Roth, Peter M and Bischof, Horst',
+        title='Annotated facial landmarks in the wild: '
+        'A large-scale, real-world database for facial '
+        'landmark localization',
+        container='2011 IEEE international conference on computer '
+        'vision workshops (ICCV workshops)',
+        year='2011',
+        homepage='https://www.tugraz.at/institute/icg/research/'
+        'team-bischof/lrs/downloads/aflw/',
+    ),
+    keypoint_info={
+        0:
+        dict(name='kpt-0', id=0, color=[255, 255, 255], type='', swap='kpt-5'),
+        1:
+        dict(name='kpt-1', id=1, color=[255, 255, 255], type='', swap='kpt-4'),
+        2:
+        dict(name='kpt-2', id=2, color=[255, 255, 255], type='', swap='kpt-3'),
+        3:
+        dict(name='kpt-3', id=3, color=[255, 255, 255], type='', swap='kpt-2'),
+        4:
+        dict(name='kpt-4', id=4, color=[255, 255, 255], type='', swap='kpt-1'),
+        5:
+        dict(name='kpt-5', id=5, color=[255, 255, 255], type='', swap='kpt-0'),
+        6:
+        dict(
+            name='kpt-6', id=6, color=[255, 255, 255], type='', swap='kpt-11'),
+        7:
+        dict(
+            name='kpt-7', id=7, color=[255, 255, 255], type='', swap='kpt-10'),
+        8:
+        dict(name='kpt-8', id=8, color=[255, 255, 255], type='', swap='kpt-9'),
+        9:
+        dict(name='kpt-9', id=9, color=[255, 255, 255], type='', swap='kpt-8'),
+        10:
+        dict(
+            name='kpt-10', id=10, color=[255, 255, 255], type='',
+            swap='kpt-7'),
+        11:
+        dict(
+            name='kpt-11', id=11, color=[255, 255, 255], type='',
+            swap='kpt-6'),
+        12:
+        dict(
+            name='kpt-12',
+            id=12,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-14'),
+        13:
+        dict(name='kpt-13', id=13, color=[255, 255, 255], type='', swap=''),
+        14:
+        dict(
+            name='kpt-14',
+            id=14,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-12'),
+        15:
+        dict(
+            name='kpt-15',
+            id=15,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-17'),
+        16:
+        dict(name='kpt-16', id=16, color=[255, 255, 255], type='', swap=''),
+        17:
+        dict(
+            name='kpt-17',
+            id=17,
+            color=[255, 255, 255],
+            type='',
+            swap='kpt-15'),
+        18:
+        dict(name='kpt-18', id=18, color=[255, 255, 255], type='', swap='')
+    },
+    skeleton_info={},
+    joint_weights=[1.] * 19,
+    sigmas=[])

ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/aic.py ADDED Viewed

	@@ -0,0 +1,140 @@

+dataset_info = dict(
+    dataset_name='aic',
+    paper_info=dict(
+        author='Wu, Jiahong and Zheng, He and Zhao, Bo and '
+        'Li, Yixin and Yan, Baoming and Liang, Rui and '
+        'Wang, Wenjia and Zhou, Shipei and Lin, Guosen and '
+        'Fu, Yanwei and others',
+        title='Ai challenger: A large-scale dataset for going '
+        'deeper in image understanding',
+        container='arXiv',
+        year='2017',
+        homepage='https://github.com/AIChallenger/AI_Challenger_2017',
+    ),
+    keypoint_info={
+        0:
+        dict(
+            name='right_shoulder',
+            id=0,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_shoulder'),
+        1:
+        dict(
+            name='right_elbow',
+            id=1,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_elbow'),
+        2:
+        dict(
+            name='right_wrist',
+            id=2,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_wrist'),
+        3:
+        dict(
+            name='left_shoulder',
+            id=3,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_shoulder'),
+        4:
+        dict(
+            name='left_elbow',
+            id=4,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_elbow'),
+        5:
+        dict(
+            name='left_wrist',
+            id=5,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_wrist'),
+        6:
+        dict(
+            name='right_hip',
+            id=6,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_hip'),
+        7:
+        dict(
+            name='right_knee',
+            id=7,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_knee'),
+        8:
+        dict(
+            name='right_ankle',
+            id=8,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_ankle'),
+        9:
+        dict(
+            name='left_hip',
+            id=9,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_hip'),
+        10:
+        dict(
+            name='left_knee',
+            id=10,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_knee'),
+        11:
+        dict(
+            name='left_ankle',
+            id=11,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_ankle'),
+        12:
+        dict(
+            name='head_top',
+            id=12,
+            color=[51, 153, 255],
+            type='upper',
+            swap=''),
+        13:
+        dict(name='neck', id=13, color=[51, 153, 255], type='upper', swap='')
+    },
+    skeleton_info={
+        0:
+        dict(link=('right_wrist', 'right_elbow'), id=0, color=[255, 128, 0]),
+        1: dict(
+            link=('right_elbow', 'right_shoulder'), id=1, color=[255, 128, 0]),
+        2: dict(link=('right_shoulder', 'neck'), id=2, color=[51, 153, 255]),
+        3: dict(link=('neck', 'left_shoulder'), id=3, color=[51, 153, 255]),
+        4: dict(link=('left_shoulder', 'left_elbow'), id=4, color=[0, 255, 0]),
+        5: dict(link=('left_elbow', 'left_wrist'), id=5, color=[0, 255, 0]),
+        6: dict(link=('right_ankle', 'right_knee'), id=6, color=[255, 128, 0]),
+        7: dict(link=('right_knee', 'right_hip'), id=7, color=[255, 128, 0]),
+        8: dict(link=('right_hip', 'left_hip'), id=8, color=[51, 153, 255]),
+        9: dict(link=('left_hip', 'left_knee'), id=9, color=[0, 255, 0]),
+        10: dict(link=('left_knee', 'left_ankle'), id=10, color=[0, 255, 0]),
+        11: dict(link=('head_top', 'neck'), id=11, color=[51, 153, 255]),
+        12: dict(
+            link=('right_shoulder', 'right_hip'), id=12, color=[51, 153, 255]),
+        13:
+        dict(link=('left_shoulder', 'left_hip'), id=13, color=[51, 153, 255])
+    },
+    joint_weights=[
+        1., 1.2, 1.5, 1., 1.2, 1.5, 1., 1.2, 1.5, 1., 1.2, 1.5, 1., 1.
+    ],
+    # 'https://github.com/AIChallenger/AI_Challenger_2017/blob/master/'
+    # 'Evaluation/keypoint_eval/keypoint_eval.py#L50'
+    # delta = 2 x sigma
+    sigmas=[
+        0.01388152, 0.01515228, 0.01057665, 0.01417709, 0.01497891, 0.01402144,
+        0.03909642, 0.03686941, 0.01981803, 0.03843971, 0.03412318, 0.02415081,
+        0.01291456, 0.01236173
+    ])

ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/aic_info.py ADDED Viewed

	@@ -0,0 +1,140 @@

+aic_info = dict(
+    dataset_name='aic',
+    paper_info=dict(
+        author='Wu, Jiahong and Zheng, He and Zhao, Bo and '
+        'Li, Yixin and Yan, Baoming and Liang, Rui and '
+        'Wang, Wenjia and Zhou, Shipei and Lin, Guosen and '
+        'Fu, Yanwei and others',
+        title='Ai challenger: A large-scale dataset for going '
+        'deeper in image understanding',
+        container='arXiv',
+        year='2017',
+        homepage='https://github.com/AIChallenger/AI_Challenger_2017',
+    ),
+    keypoint_info={
+        0:
+        dict(
+            name='right_shoulder',
+            id=0,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_shoulder'),
+        1:
+        dict(
+            name='right_elbow',
+            id=1,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_elbow'),
+        2:
+        dict(
+            name='right_wrist',
+            id=2,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_wrist'),
+        3:
+        dict(
+            name='left_shoulder',
+            id=3,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_shoulder'),
+        4:
+        dict(
+            name='left_elbow',
+            id=4,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_elbow'),
+        5:
+        dict(
+            name='left_wrist',
+            id=5,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_wrist'),
+        6:
+        dict(
+            name='right_hip',
+            id=6,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_hip'),
+        7:
+        dict(
+            name='right_knee',
+            id=7,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_knee'),
+        8:
+        dict(
+            name='right_ankle',
+            id=8,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_ankle'),
+        9:
+        dict(
+            name='left_hip',
+            id=9,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_hip'),
+        10:
+        dict(
+            name='left_knee',
+            id=10,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_knee'),
+        11:
+        dict(
+            name='left_ankle',
+            id=11,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_ankle'),
+        12:
+        dict(
+            name='head_top',
+            id=12,
+            color=[51, 153, 255],
+            type='upper',
+            swap=''),
+        13:
+        dict(name='neck', id=13, color=[51, 153, 255], type='upper', swap='')
+    },
+    skeleton_info={
+        0:
+        dict(link=('right_wrist', 'right_elbow'), id=0, color=[255, 128, 0]),
+        1: dict(
+            link=('right_elbow', 'right_shoulder'), id=1, color=[255, 128, 0]),
+        2: dict(link=('right_shoulder', 'neck'), id=2, color=[51, 153, 255]),
+        3: dict(link=('neck', 'left_shoulder'), id=3, color=[51, 153, 255]),
+        4: dict(link=('left_shoulder', 'left_elbow'), id=4, color=[0, 255, 0]),
+        5: dict(link=('left_elbow', 'left_wrist'), id=5, color=[0, 255, 0]),
+        6: dict(link=('right_ankle', 'right_knee'), id=6, color=[255, 128, 0]),
+        7: dict(link=('right_knee', 'right_hip'), id=7, color=[255, 128, 0]),
+        8: dict(link=('right_hip', 'left_hip'), id=8, color=[51, 153, 255]),
+        9: dict(link=('left_hip', 'left_knee'), id=9, color=[0, 255, 0]),
+        10: dict(link=('left_knee', 'left_ankle'), id=10, color=[0, 255, 0]),
+        11: dict(link=('head_top', 'neck'), id=11, color=[51, 153, 255]),
+        12: dict(
+            link=('right_shoulder', 'right_hip'), id=12, color=[51, 153, 255]),
+        13:
+        dict(link=('left_shoulder', 'left_hip'), id=13, color=[51, 153, 255])
+    },
+    joint_weights=[
+        1., 1.2, 1.5, 1., 1.2, 1.5, 1., 1.2, 1.5, 1., 1.2, 1.5, 1., 1.
+    ],
+    # 'https://github.com/AIChallenger/AI_Challenger_2017/blob/master/'
+    # 'Evaluation/keypoint_eval/keypoint_eval.py#L50'
+    # delta = 2 x sigma
+    sigmas=[
+        0.01388152, 0.01515228, 0.01057665, 0.01417709, 0.01497891, 0.01402144,
+        0.03909642, 0.03686941, 0.01981803, 0.03843971, 0.03412318, 0.02415081,
+        0.01291456, 0.01236173
+    ])

ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/animalpose.py ADDED Viewed

	@@ -0,0 +1,166 @@

+dataset_info = dict(
+    dataset_name='animalpose',
+    paper_info=dict(
+        author='Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and '
+        'Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing',
+        title='Cross-Domain Adaptation for Animal Pose Estimation',
+        container='The IEEE International Conference on '
+        'Computer Vision (ICCV)',
+        year='2019',
+        homepage='https://sites.google.com/view/animal-pose/',
+    ),
+    keypoint_info={
+        0:
+        dict(
+            name='L_Eye', id=0, color=[0, 255, 0], type='upper', swap='R_Eye'),
+        1:
+        dict(
+            name='R_Eye',
+            id=1,
+            color=[255, 128, 0],
+            type='upper',
+            swap='L_Eye'),
+        2:
+        dict(
+            name='L_EarBase',
+            id=2,
+            color=[0, 255, 0],
+            type='upper',
+            swap='R_EarBase'),
+        3:
+        dict(
+            name='R_EarBase',
+            id=3,
+            color=[255, 128, 0],
+            type='upper',
+            swap='L_EarBase'),
+        4:
+        dict(name='Nose', id=4, color=[51, 153, 255], type='upper', swap=''),
+        5:
+        dict(name='Throat', id=5, color=[51, 153, 255], type='upper', swap=''),
+        6:
+        dict(
+            name='TailBase', id=6, color=[51, 153, 255], type='lower',
+            swap=''),
+        7:
+        dict(
+            name='Withers', id=7, color=[51, 153, 255], type='upper', swap=''),
+        8:
+        dict(
+            name='L_F_Elbow',
+            id=8,
+            color=[0, 255, 0],
+            type='upper',
+            swap='R_F_Elbow'),
+        9:
+        dict(
+            name='R_F_Elbow',
+            id=9,
+            color=[255, 128, 0],
+            type='upper',
+            swap='L_F_Elbow'),
+        10:
+        dict(
+            name='L_B_Elbow',
+            id=10,
+            color=[0, 255, 0],
+            type='lower',
+            swap='R_B_Elbow'),
+        11:
+        dict(
+            name='R_B_Elbow',
+            id=11,
+            color=[255, 128, 0],
+            type='lower',
+            swap='L_B_Elbow'),
+        12:
+        dict(
+            name='L_F_Knee',
+            id=12,
+            color=[0, 255, 0],
+            type='upper',
+            swap='R_F_Knee'),
+        13:
+        dict(
+            name='R_F_Knee',
+            id=13,
+            color=[255, 128, 0],
+            type='upper',
+            swap='L_F_Knee'),
+        14:
+        dict(
+            name='L_B_Knee',
+            id=14,
+            color=[0, 255, 0],
+            type='lower',
+            swap='R_B_Knee'),
+        15:
+        dict(
+            name='R_B_Knee',
+            id=15,
+            color=[255, 128, 0],
+            type='lower',
+            swap='L_B_Knee'),
+        16:
+        dict(
+            name='L_F_Paw',
+            id=16,
+            color=[0, 255, 0],
+            type='upper',
+            swap='R_F_Paw'),
+        17:
+        dict(
+            name='R_F_Paw',
+            id=17,
+            color=[255, 128, 0],
+            type='upper',
+            swap='L_F_Paw'),
+        18:
+        dict(
+            name='L_B_Paw',
+            id=18,
+            color=[0, 255, 0],
+            type='lower',
+            swap='R_B_Paw'),
+        19:
+        dict(
+            name='R_B_Paw',
+            id=19,
+            color=[255, 128, 0],
+            type='lower',
+            swap='L_B_Paw')
+    },
+    skeleton_info={
+        0: dict(link=('L_Eye', 'R_Eye'), id=0, color=[51, 153, 255]),
+        1: dict(link=('L_Eye', 'L_EarBase'), id=1, color=[0, 255, 0]),
+        2: dict(link=('R_Eye', 'R_EarBase'), id=2, color=[255, 128, 0]),
+        3: dict(link=('L_Eye', 'Nose'), id=3, color=[0, 255, 0]),
+        4: dict(link=('R_Eye', 'Nose'), id=4, color=[255, 128, 0]),
+        5: dict(link=('Nose', 'Throat'), id=5, color=[51, 153, 255]),
+        6: dict(link=('Throat', 'Withers'), id=6, color=[51, 153, 255]),
+        7: dict(link=('TailBase', 'Withers'), id=7, color=[51, 153, 255]),
+        8: dict(link=('Throat', 'L_F_Elbow'), id=8, color=[0, 255, 0]),
+        9: dict(link=('L_F_Elbow', 'L_F_Knee'), id=9, color=[0, 255, 0]),
+        10: dict(link=('L_F_Knee', 'L_F_Paw'), id=10, color=[0, 255, 0]),
+        11: dict(link=('Throat', 'R_F_Elbow'), id=11, color=[255, 128, 0]),
+        12: dict(link=('R_F_Elbow', 'R_F_Knee'), id=12, color=[255, 128, 0]),
+        13: dict(link=('R_F_Knee', 'R_F_Paw'), id=13, color=[255, 128, 0]),
+        14: dict(link=('TailBase', 'L_B_Elbow'), id=14, color=[0, 255, 0]),
+        15: dict(link=('L_B_Elbow', 'L_B_Knee'), id=15, color=[0, 255, 0]),
+        16: dict(link=('L_B_Knee', 'L_B_Paw'), id=16, color=[0, 255, 0]),
+        17: dict(link=('TailBase', 'R_B_Elbow'), id=17, color=[255, 128, 0]),
+        18: dict(link=('R_B_Elbow', 'R_B_Knee'), id=18, color=[255, 128, 0]),
+        19: dict(link=('R_B_Knee', 'R_B_Paw'), id=19, color=[255, 128, 0])
+    },
+    joint_weights=[
+        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.2, 1.2,
+        1.5, 1.5, 1.5, 1.5
+    ],
+    # Note: The original paper did not provide enough information about
+    # the sigmas. We modified from 'https://github.com/cocodataset/'
+    # 'cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L523'
+    sigmas=[
+        0.025, 0.025, 0.026, 0.035, 0.035, 0.10, 0.10, 0.10, 0.107, 0.107,
+        0.107, 0.107, 0.087, 0.087, 0.087, 0.087, 0.089, 0.089, 0.089, 0.089
+    ])

ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/ap10k.py ADDED Viewed

	@@ -0,0 +1,142 @@

+dataset_info = dict(
+    dataset_name='ap10k',
+    paper_info=dict(
+        author='Yu, Hang and Xu, Yufei and Zhang, Jing and '
+        'Zhao, Wei and Guan, Ziyu and Tao, Dacheng',
+        title='AP-10K: A Benchmark for Animal Pose Estimation in the Wild',
+        container='35th Conference on Neural Information Processing Systems '
+        '(NeurIPS 2021) Track on Datasets and Bench-marks.',
+        year='2021',
+        homepage='https://github.com/AlexTheBad/AP-10K',
+    ),
+    keypoint_info={
+        0:
+        dict(
+            name='L_Eye', id=0, color=[0, 255, 0], type='upper', swap='R_Eye'),
+        1:
+        dict(
+            name='R_Eye',
+            id=1,
+            color=[255, 128, 0],
+            type='upper',
+            swap='L_Eye'),
+        2:
+        dict(name='Nose', id=2, color=[51, 153, 255], type='upper', swap=''),
+        3:
+        dict(name='Neck', id=3, color=[51, 153, 255], type='upper', swap=''),
+        4:
+        dict(
+            name='Root of tail',
+            id=4,
+            color=[51, 153, 255],
+            type='lower',
+            swap=''),
+        5:
+        dict(
+            name='L_Shoulder',
+            id=5,
+            color=[51, 153, 255],
+            type='upper',
+            swap='R_Shoulder'),
+        6:
+        dict(
+            name='L_Elbow',
+            id=6,
+            color=[51, 153, 255],
+            type='upper',
+            swap='R_Elbow'),
+        7:
+        dict(
+            name='L_F_Paw',
+            id=7,
+            color=[0, 255, 0],
+            type='upper',
+            swap='R_F_Paw'),
+        8:
+        dict(
+            name='R_Shoulder',
+            id=8,
+            color=[0, 255, 0],
+            type='upper',
+            swap='L_Shoulder'),
+        9:
+        dict(
+            name='R_Elbow',
+            id=9,
+            color=[255, 128, 0],
+            type='upper',
+            swap='L_Elbow'),
+        10:
+        dict(
+            name='R_F_Paw',
+            id=10,
+            color=[0, 255, 0],
+            type='lower',
+            swap='L_F_Paw'),
+        11:
+        dict(
+            name='L_Hip',
+            id=11,
+            color=[255, 128, 0],
+            type='lower',
+            swap='R_Hip'),
+        12:
+        dict(
+            name='L_Knee',
+            id=12,
+            color=[255, 128, 0],
+            type='lower',
+            swap='R_Knee'),
+        13:
+        dict(
+            name='L_B_Paw',
+            id=13,
+            color=[0, 255, 0],
+            type='lower',
+            swap='R_B_Paw'),
+        14:
+        dict(
+            name='R_Hip', id=14, color=[0, 255, 0], type='lower',
+            swap='L_Hip'),
+        15:
+        dict(
+            name='R_Knee',
+            id=15,
+            color=[0, 255, 0],
+            type='lower',
+            swap='L_Knee'),
+        16:
+        dict(
+            name='R_B_Paw',
+            id=16,
+            color=[0, 255, 0],
+            type='lower',
+            swap='L_B_Paw'),
+    },
+    skeleton_info={
+        0: dict(link=('L_Eye', 'R_Eye'), id=0, color=[0, 0, 255]),
+        1: dict(link=('L_Eye', 'Nose'), id=1, color=[0, 0, 255]),
+        2: dict(link=('R_Eye', 'Nose'), id=2, color=[0, 0, 255]),
+        3: dict(link=('Nose', 'Neck'), id=3, color=[0, 255, 0]),
+        4: dict(link=('Neck', 'Root of tail'), id=4, color=[0, 255, 0]),
+        5: dict(link=('Neck', 'L_Shoulder'), id=5, color=[0, 255, 255]),
+        6: dict(link=('L_Shoulder', 'L_Elbow'), id=6, color=[0, 255, 255]),
+        7: dict(link=('L_Elbow', 'L_F_Paw'), id=6, color=[0, 255, 255]),
+        8: dict(link=('Neck', 'R_Shoulder'), id=7, color=[6, 156, 250]),
+        9: dict(link=('R_Shoulder', 'R_Elbow'), id=8, color=[6, 156, 250]),
+        10: dict(link=('R_Elbow', 'R_F_Paw'), id=9, color=[6, 156, 250]),
+        11: dict(link=('Root of tail', 'L_Hip'), id=10, color=[0, 255, 255]),
+        12: dict(link=('L_Hip', 'L_Knee'), id=11, color=[0, 255, 255]),
+        13: dict(link=('L_Knee', 'L_B_Paw'), id=12, color=[0, 255, 255]),
+        14: dict(link=('Root of tail', 'R_Hip'), id=13, color=[6, 156, 250]),
+        15: dict(link=('R_Hip', 'R_Knee'), id=14, color=[6, 156, 250]),
+        16: dict(link=('R_Knee', 'R_B_Paw'), id=15, color=[6, 156, 250]),
+    },
+    joint_weights=[
+        1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5,
+        1.5
+    ],
+    sigmas=[
+        0.025, 0.025, 0.026, 0.035, 0.035, 0.079, 0.072, 0.062, 0.079, 0.072,
+        0.062, 0.107, 0.087, 0.089, 0.107, 0.087, 0.089
+    ])

ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/ap10k_info.py ADDED Viewed

	@@ -0,0 +1,142 @@

+ap10k_info = dict(
+    dataset_name='ap10k',
+    paper_info=dict(
+        author='Yu, Hang and Xu, Yufei and Zhang, Jing and '
+        'Zhao, Wei and Guan, Ziyu and Tao, Dacheng',
+        title='AP-10K: A Benchmark for Animal Pose Estimation in the Wild',
+        container='35th Conference on Neural Information Processing Systems '
+        '(NeurIPS 2021) Track on Datasets and Bench-marks.',
+        year='2021',
+        homepage='https://github.com/AlexTheBad/AP-10K',
+    ),
+    keypoint_info={
+        0:
+        dict(
+            name='L_Eye', id=0, color=[0, 255, 0], type='upper', swap='R_Eye'),
+        1:
+        dict(
+            name='R_Eye',
+            id=1,
+            color=[255, 128, 0],
+            type='upper',
+            swap='L_Eye'),
+        2:
+        dict(name='Nose', id=2, color=[51, 153, 255], type='upper', swap=''),
+        3:
+        dict(name='Neck', id=3, color=[51, 153, 255], type='upper', swap=''),
+        4:
+        dict(
+            name='Root of tail',
+            id=4,
+            color=[51, 153, 255],
+            type='lower',
+            swap=''),
+        5:
+        dict(
+            name='L_Shoulder',
+            id=5,
+            color=[51, 153, 255],
+            type='upper',
+            swap='R_Shoulder'),
+        6:
+        dict(
+            name='L_Elbow',
+            id=6,
+            color=[51, 153, 255],
+            type='upper',
+            swap='R_Elbow'),
+        7:
+        dict(
+            name='L_F_Paw',
+            id=7,
+            color=[0, 255, 0],
+            type='upper',
+            swap='R_F_Paw'),
+        8:
+        dict(
+            name='R_Shoulder',
+            id=8,
+            color=[0, 255, 0],
+            type='upper',
+            swap='L_Shoulder'),
+        9:
+        dict(
+            name='R_Elbow',
+            id=9,
+            color=[255, 128, 0],
+            type='upper',
+            swap='L_Elbow'),
+        10:
+        dict(
+            name='R_F_Paw',
+            id=10,
+            color=[0, 255, 0],
+            type='lower',
+            swap='L_F_Paw'),
+        11:
+        dict(
+            name='L_Hip',
+            id=11,
+            color=[255, 128, 0],
+            type='lower',
+            swap='R_Hip'),
+        12:
+        dict(
+            name='L_Knee',
+            id=12,
+            color=[255, 128, 0],
+            type='lower',
+            swap='R_Knee'),
+        13:
+        dict(
+            name='L_B_Paw',
+            id=13,
+            color=[0, 255, 0],
+            type='lower',
+            swap='R_B_Paw'),
+        14:
+        dict(
+            name='R_Hip', id=14, color=[0, 255, 0], type='lower',
+            swap='L_Hip'),
+        15:
+        dict(
+            name='R_Knee',
+            id=15,
+            color=[0, 255, 0],
+            type='lower',
+            swap='L_Knee'),
+        16:
+        dict(
+            name='R_B_Paw',
+            id=16,
+            color=[0, 255, 0],
+            type='lower',
+            swap='L_B_Paw'),
+    },
+    skeleton_info={
+        0: dict(link=('L_Eye', 'R_Eye'), id=0, color=[0, 0, 255]),
+        1: dict(link=('L_Eye', 'Nose'), id=1, color=[0, 0, 255]),
+        2: dict(link=('R_Eye', 'Nose'), id=2, color=[0, 0, 255]),
+        3: dict(link=('Nose', 'Neck'), id=3, color=[0, 255, 0]),
+        4: dict(link=('Neck', 'Root of tail'), id=4, color=[0, 255, 0]),
+        5: dict(link=('Neck', 'L_Shoulder'), id=5, color=[0, 255, 255]),
+        6: dict(link=('L_Shoulder', 'L_Elbow'), id=6, color=[0, 255, 255]),
+        7: dict(link=('L_Elbow', 'L_F_Paw'), id=6, color=[0, 255, 255]),
+        8: dict(link=('Neck', 'R_Shoulder'), id=7, color=[6, 156, 250]),
+        9: dict(link=('R_Shoulder', 'R_Elbow'), id=8, color=[6, 156, 250]),
+        10: dict(link=('R_Elbow', 'R_F_Paw'), id=9, color=[6, 156, 250]),
+        11: dict(link=('Root of tail', 'L_Hip'), id=10, color=[0, 255, 255]),
+        12: dict(link=('L_Hip', 'L_Knee'), id=11, color=[0, 255, 255]),
+        13: dict(link=('L_Knee', 'L_B_Paw'), id=12, color=[0, 255, 255]),
+        14: dict(link=('Root of tail', 'R_Hip'), id=13, color=[6, 156, 250]),
+        15: dict(link=('R_Hip', 'R_Knee'), id=14, color=[6, 156, 250]),
+        16: dict(link=('R_Knee', 'R_B_Paw'), id=15, color=[6, 156, 250]),
+    },
+    joint_weights=[
+        1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5,
+        1.5
+    ],
+    sigmas=[
+        0.025, 0.025, 0.026, 0.035, 0.035, 0.079, 0.072, 0.062, 0.079, 0.072,
+        0.062, 0.107, 0.087, 0.089, 0.107, 0.087, 0.089
+    ])

ViTPose/easy_ViTPose/easy_ViTPose/configs/_base_/datasets/atrw.py ADDED Viewed

	@@ -0,0 +1,144 @@

+dataset_info = dict(
+    dataset_name='atrw',
+    paper_info=dict(
+        author='Li, Shuyuan and Li, Jianguo and Tang, Hanlin '
+        'and Qian, Rui and Lin, Weiyao',
+        title='ATRW: A Benchmark for Amur Tiger '
+        'Re-identification in the Wild',
+        container='Proceedings of the 28th ACM '
+        'International Conference on Multimedia',
+        year='2020',
+        homepage='https://cvwc2019.github.io/challenge.html',
+    ),
+    keypoint_info={
+        0:
+        dict(
+            name='left_ear',
+            id=0,
+            color=[51, 153, 255],
+            type='upper',
+            swap='right_ear'),
+        1:
+        dict(
+            name='right_ear',
+            id=1,
+            color=[51, 153, 255],
+            type='upper',
+            swap='left_ear'),
+        2:
+        dict(name='nose', id=2, color=[51, 153, 255], type='upper', swap=''),
+        3:
+        dict(
+            name='right_shoulder',
+            id=3,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_shoulder'),
+        4:
+        dict(
+            name='right_front_paw',
+            id=4,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_front_paw'),
+        5:
+        dict(
+            name='left_shoulder',
+            id=5,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_shoulder'),
+        6:
+        dict(
+            name='left_front_paw',
+            id=6,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_front_paw'),
+        7:
+        dict(
+            name='right_hip',
+            id=7,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_hip'),
+        8:
+        dict(
+            name='right_knee',
+            id=8,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_knee'),
+        9:
+        dict(
+            name='right_back_paw',
+            id=9,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_back_paw'),
+        10:
+        dict(
+            name='left_hip',
+            id=10,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_hip'),
+        11:
+        dict(
+            name='left_knee',
+            id=11,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_knee'),
+        12:
+        dict(
+            name='left_back_paw',
+            id=12,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_back_paw'),
+        13:
+        dict(name='tail', id=13, color=[51, 153, 255], type='lower', swap=''),
+        14:
+        dict(
+            name='center', id=14, color=[51, 153, 255], type='lower', swap=''),
+    },
+    skeleton_info={
+        0:
+        dict(link=('left_ear', 'nose'), id=0, color=[51, 153, 255]),
+        1:
+        dict(link=('right_ear', 'nose'), id=1, color=[51, 153, 255]),
+        2:
+        dict(link=('nose', 'center'), id=2, color=[51, 153, 255]),
+        3:
+        dict(
+            link=('left_shoulder', 'left_front_paw'), id=3, color=[0, 255, 0]),
+        4:
+        dict(link=('left_shoulder', 'center'), id=4, color=[0, 255, 0]),
+        5:
+        dict(
+            link=('right_shoulder', 'right_front_paw'),
+            id=5,
+            color=[255, 128, 0]),
+        6:
+        dict(link=('right_shoulder', 'center'), id=6, color=[255, 128, 0]),
+        7:
+        dict(link=('tail', 'center'), id=7, color=[51, 153, 255]),
+        8:
+        dict(link=('right_back_paw', 'right_knee'), id=8, color=[255, 128, 0]),
+        9:
+        dict(link=('right_knee', 'right_hip'), id=9, color=[255, 128, 0]),
+        10:
+        dict(link=('right_hip', 'tail'), id=10, color=[255, 128, 0]),
+        11:
+        dict(link=('left_back_paw', 'left_knee'), id=11, color=[0, 255, 0]),
+        12:
+        dict(link=('left_knee', 'left_hip'), id=12, color=[0, 255, 0]),
+        13:
+        dict(link=('left_hip', 'tail'), id=13, color=[0, 255, 0]),
+    },
+    joint_weights=[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+    sigmas=[
+        0.0277, 0.0823, 0.0831, 0.0202, 0.0716, 0.0263, 0.0646, 0.0302, 0.0440,
+        0.0316, 0.0333, 0.0547, 0.0263, 0.0683, 0.0539
+    ])