Spaces:

SEA-AI
/

det-metrics

Running

App Files Files Community

franzi2505 commited on Feb 7, 2024

Commit

f965db0

1 Parent(s): 3ae0b30

add files

Browse files

Files changed (7) hide show

README.md +205 -7
app.py +6 -0
detection_metric.py +246 -0
modified_coco/cocoeval.py +693 -0
modified_coco/pr_rec_f1.py +620 -0
modified_coco/utils.py +220 -0
requirements.txt +3 -0

README.md CHANGED Viewed

@@ -1,13 +1,211 @@
 ---
-title: Detection Metrics
-emoji: 👀
-colorFrom: red
-colorTo: red
 sdk: gradio
-sdk_version: 4.17.0
 app_file: app.py
 pinned: false
-license: agpl-3.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Detection Metric
+tags:
+- evaluate
+- metric
+description: "Compute multiple object detection metrics at different bounding box area levels."
 sdk: gradio
+sdk_version: 3.19.1
 app_file: app.py
 pinned: false
 ---
+# Metric Card for Detection Metric
+## Metric Description
+This metric can be used to calculate object detection metrics. It has an option to calculate the metrics at different levels of bounding box sizes, so that more insight is provided into the performance for different objects. It is adapted from the base of pycocotools metrics.
+## How to Use
+```
+>>> module = evaluate.load("./detection_metric.py")
+# shape: (n_images, m_predicted_bboxes, xywh)
+>>> predictions = [
+        [
+            [10, 15, 5, 9],
+            [45, 30, 10, 10]
+        ],[
+            [14, 25, 6, 6],
+            [10, 16, 6, 10]
+        ],
+    ]
+# shape: (n_images, m_gt_bboxes, xywh)
+>>> references = [
+        [[10, 16, 6, 10]],
+        [[30, 30, 5, 6]]
+    ]
+>>> module.add_batch(
+        predictions=predictions,
+        references=references,
+        predictions_scores=[[0.5,0.1], [0.8, 0.2]]
+    )
+>>> module.compute()
+```
+### Metric Settings
+When loading module: `module = evaluate.load("./detection_metric.py", **params)`, multiple parameters can be specified.
+- **area_ranges_tuples** *List[Tuple[str, List[int]]]*: Different levels of area ranges at which metrics should be calculated. It is a list that contains tuples, where the first element of each tuple should specify the name of the area range and the second element is list specifying the lower and upper limit of the area range. Defaults to `[("all", [0, 1e5.pow(2)])]`.
+- **bbox_format** *Literal["xyxy", "xywh", "cxcywh"]*: Bounding box format of predictions and ground truth. Defaults to `"xywh"`.
+- **iou_threshold** *Optional[float]*: at which IOU-treshold the metrics should be calculated. IOU-threshold defines the minimal overlap between a ground truth and predicted bounding box so that it is considered a correct prediction. Defaults to `1e-10`.
+- **class_agnostic** *bool*. Defaults to `True`. Non-class-agnostic metrics are currently not supported.
+### Input Values
+Add predictions to the metric with the function `module.add_batches(predictions, references)` with the following parameters:
+- **predictions** *List[List[List[int]]]*: predicted bounding boxes in shape `n x m x 4` with `n` being the number of images that are evaluated, `m` the number of predicted bounding boxes for the n-th image and the four co-ordinates specifying the bounding box (by default: x y width height).
+- **references** *List[List[List[int]]]*: ground truth bounding boxes in shape `n x l x 4` with `l` being the number of ground truth bounding boxes for the n-th image.
+### Output Values
+The metric outputs a dictionary that contains sub-dictionaries for each name of the specified area ranges.
+Each sub-dictionary holds performance metrics at the specific area range level:
+- **range**: corresponding area range
+- **iouThr**: IOU-threshold used in calculating the metric
+- **maxDets**: maximum number of detections in calculating the metrics
+- **tp**: number of true positive predictions
+- **fp**: number of false positive predictions
+- **fn**: number of false negative predictions
+- **duplicates**: number of duplicated bounding box predictions
+- **precision**: ratio between true positive predictions and positive predictions (tp/(tp+fp))
+- **recall**: ratio between true positive predictions and actual ground truths (tp/(tp+fn))
+- **f1**: trades-off precision and recall (2*(precision*recall)/(precision+recall))
+- **support**: number of ground truth bounding boxes that are considered in the metric
+- **fpi**: number of images with predictions but no ground truths
+- **nImgs**: number of total images considered in calculating the metric
+### Examples
+#### Example 1
+Basic usage example. Add predictions and references via `module.add_batch(predictions, references)` function. Finally, compute the metrics accross predictions and ground truths over different images via `module.compute()`.
+```
+>>> module = evaluate.load("./detection_metric.py", iou_thresholds=0.9)
+>>> predictions = [
+        [
+            [10, 15, 20, 25],
+            [45, 30, 10, 10]
+        ],[
+            [14, 25, 6, 6],
+            [10, 16, 6, 10]
+        ]
+    ]
+>>> references = [
+        [[10, 15, 20, 20]],
+        [[30, 30, 5, 6]]
+    ]
+>>> module.add_batch(predictions=predictions, references=references, predictions_scores=[[0.5,0.3],[0.8, 0.1]])
+>>> result = module.compute()
+>>> print(result)
+{'all': {
+    'range': [0, 10000000000.0],
+    'iouThr': '0.00',
+    'maxDets': 100,
+    'tp': 1,
+    'fp': 3,
+    'fn': 1,
+    'duplicates': 0,
+    'precision': 0.25,
+    'recall': 0.5,
+    'f1': 0.3333333333333333,
+    'support': 2,
+    'fpi': 0,
+    'nImgs': 2
+    }
+}
+```
+#### Example 2
+We can specify different area range levels, at which we would like to compute the metrics. Further note that in the references, there is an empty list for the first image, because it does not include any ground truth bounding boxes. We still need to include it, so that we can map the false positive prediction to the references boxes correctly.
+```
+>>> area_ranges_tuples = [
+        ("all", [0, 1e5 ** 2]),
+        ("small", [0 ** 2, 6 ** 2]),
+        ("medium", [6 ** 2, 12 ** 2]),
+        ("large", [12 ** 2, 1e5 ** 2])
+    ]
+>>> module = evaluate.load("./detection_metric.py", area_ranges_tuples=area_ranges_tuples)
+>>> predictions = [
+        [
+            [10, 15, 5, 5],
+            [45, 30, 10, 10]
+        ],[
+            [50, 50, 6, 10]
+        ],
+    ]
+>>> references = [
+        [],
+        [[10, 15, 5, 5]]
+    ]
+>>> module.add_batch(predictions=predictions, references=references)
+>>> result = module.compute()
+>>> print(result)
+{'all':
+    {'range': [0, 10000000000.0],
+    'iouThr': '0.00',
+    'maxDets': 100,
+    'tp': 0,
+    'fp': 3,
+    'fn': 1,
+    'duplicates': 0,
+    'precision': 0.0,
+    'recall': 0.0,
+    'f1': 0,
+    'support': 1,
+    'fpi': 1,
+    'nImgs': 2
+},
+'small': {
+    'range': [0, 36],
+    'iouThr': '0.00',
+    'maxDets': 100,
+    'tp': 0,
+    'fp': 1,
+    'fn': 1,
+    'duplicates': 0,
+    'precision': 0.0,
+    'recall': 0.0,
+    'f1': 0,
+    'support': 1,
+    'fpi': 1,
+    'nImgs': 2
+},
+'medium': {
+    'range': [36, 144],
+    'iouThr': '0.00',
+    'maxDets': 100,
+    'tp': 0,
+    'fp': 2,
+    'fn': 0,
+    'duplicates': 0,
+    'precision': 0.0,
+    'recall': 0,
+    'f1': 0,
+    'support': 0,
+    'fpi': 2,
+    'nImgs': 2
+}, 'large': {
+    'range': [144, 10000000000.0],
+    'iouThr': '0.00',
+    'maxDets': 100,
+    'tp': -1,
+    'fp': -1,
+    'fn': -1,
+    'duplicates': -1,
+    'precision': -1,
+    'recall': -1,
+    'f1': -1,
+    'support': 0,
+    'fpi': 0,
+    'nImgs': 2
+    }
+}
+```
+## Further References
+Calculating metrics is based on pycoco tools: https://github.com/cocodataset/cocoapi/tree/master/PythonAPI/pycocotools
+Further info about metrics: https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/

app.py ADDED Viewed

	@@ -0,0 +1,6 @@

+import evaluate
+from evaluate.utils import launch_gradio_widget
+module = evaluate.load("./detection_metric.py",)
+launch_gradio_widget(module)

detection_metric.py ADDED Viewed

	@@ -0,0 +1,246 @@

+# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""TODO: Add a description here."""
+from typing import List, Tuple, Optional, Literal
+import evaluate
+import datasets
+import numpy as np
+from modified_coco.pr_rec_f1 import PrecisionRecallF1Support
+_CITATION = """\
+@InProceedings{coco:2020,
+title = {Microsoft {COCO:} Common Objects in Context},
+authors={Tsung{-}Yi Lin and
+                  Michael Maire and
+                  Serge J. Belongie and
+                  James Hays and
+                  Pietro Perona and
+                  Deva Ramanan and
+                  Piotr Dollar and
+                  C. Lawrence Zitnick},
+booktitle    = {Computer Vision - {ECCV} 2014 - 13th European Conference, Zurich,
+                Switzerland, September 6-12, 2014, Proceedings, Part {V}},
+series       = {Lecture Notes in Computer Science},
+volume       = {8693},
+pages        = {740--755},
+publisher    = {Springer},
+year={2014}
+}
+"""
+_DESCRIPTION = """\
+This evaluation metric is designed to give provide object detection metrics at different object size levels.
+It is based on a modified version of the commonly used COCO-evaluation metrics.
+"""
+_KWARGS_DESCRIPTION = """
+Calculates object detection metrics given predicted and ground truth bounding boxes for a single image.
+Args:
+    predictions: list of predictions to score. Each prediction should
+        be a list containing the four co-ordinates that specify the bounding box.
+        Co-ordinate format is as defined when instantiating the metric
+        (parameter: bbox_type, defaults to xywh).
+    references: list of reference for each prediction. Each prediction should
+        be a list containing the four co-ordinates that specify the bounding box.
+        Bounding box format should be the same as for the predictions.
+Returns:
+    dict containing dicts for each specified area range with following items:
+        'range': specified area with [max_px_area, max_px_area]
+        'iouThr': min. IOU-threshold of a prediction with a ground truth box
+            to be considered a correct prediction
+        'maxDets': maximum number of detections
+        'tp': number of true positive (correct) predictions
+        'fp': number of false positive (incorrect) predictions
+        'fn': number of false negative (missed) predictions
+        'duplicates': number of duplicate predictions
+        'precision': best possible score = 1, worst possible score = 0
+            large if few false positive predictions
+            formula: tp/(fp+tp)
+        'recall' best possible score = 1, worst possible score = 0
+            large if few missed predictions
+            formula: tp/(tp+fn)
+        'f1': best possible score = 1, worst possible score = 0
+            trades off precision and recall
+            formula: 2*(precision*recall)/(precision+recall)
+        'support': number of ground truth bounding boxes considered in the evaluation,
+        'fpi': number of images with no ground truth but false positive predictions,
+        'nImgs': number of images considered in evaluation
+Examples:
+    >>> module = evaluate.load("./detection_metric.py", iou_thresholds=0.9)
+    >>> predictions = [
+            [
+                [10, 15, 20, 25],
+                [45, 30, 10, 10]
+            ],[
+                [14, 25, 6, 6],
+                [10, 16, 6, 10]
+            ]
+        ]
+    >>> references = [
+            [[10, 15, 20, 20]],
+            [[30, 30, 5, 6]]
+        ]
+    >>> module.add_batch(predictions=predictions, references=references, predictions_scores=[[0.5,0.3],[0.8, 0.1]])
+    >>> result = module.compute()
+    >>> print(result)
+    {'all': {
+        'range': [0, 10000000000.0],
+        'iouThr': '0.00',
+        'maxDets': 100,
+        'tp': 1,
+        'fp': 3,
+        'fn': 1,
+        'duplicates': 0,
+        'precision': 0.25,
+        'recall': 0.5,
+        'f1': 0.3333333333333333,
+        'support': 2,
+        'fpi': 0,
+        'nImgs': 2
+        }
+    }
+"""
+@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
+class DetectionMetric(evaluate.Metric):
+    def __init__(
+            self,
+            area_ranges_tuples: List[Tuple[str, List[int]]] = [("all", [0, 1e5 ** 2])],
+            iou_threshold: float = 1e-10,
+            class_agnostic: bool = True,
+            bbox_format: str = "xywh",
+            iou_type: Literal["bbox", "segm"] = "bbox",
+            **kwargs
+        ):
+        super().__init__(**kwargs)
+        area_ranges = [v for _, v in area_ranges_tuples]
+        area_ranges_labels = [k for k, _ in area_ranges_tuples]
+        metric_params = dict(
+            iou_thresholds=[iou_threshold],
+            area_ranges=area_ranges,
+            area_ranges_labels=area_ranges_labels,
+            class_agnostic=class_agnostic,
+            iou_type=iou_type,
+            box_format=bbox_format
+        )
+        self.coco_metric = PrecisionRecallF1Support(**metric_params)
+    def _info(self):
+        return evaluate.MetricInfo(
+            # This is the description that will appear on the modules page.
+            module_type="metric",
+            description=_DESCRIPTION,
+            citation=_CITATION,
+            inputs_description=_KWARGS_DESCRIPTION,
+            # This defines the format of each prediction and reference
+            features=datasets.Features(
+                {
+                'predictions': datasets.Sequence(feature=datasets.Sequence(datasets.Value("float"))),
+                'references': datasets.Sequence(feature=datasets.Sequence(datasets.Value("float"))),
+                }
+            ),
+            # Additional links to the codebase or references
+            codebase_urls=["https://github.com/SEA-AI/metrics/tree/main",
+                           "https://github.com/cocodataset/cocoapi/tree/master"]
+        )
+    def add_batch(
+            self,
+            predictions,
+            references,
+            predictions_labels: Optional[np.ndarray] = None,
+            predictions_scores: Optional[np.ndarray] = None,
+            references_labels: Optional[np.ndarray] = None
+        ):
+        """Add predictions and ground truths of a single image to update the metric.
+        Args:
+            predictions (List[List[List[int]]]): predicted bounding boxes, shape: (n_images, m_pred_boxes, 4)
+            references (List[List[List[int]]]): ground truth bounding boxes, shape: (n_images, l_gt_boxes, 4)
+            predictions_labels (Optional[np.ndarray], optional): Labels of predicted bounding boxes, shape: (n_images, m_pred_boxes).
+                Defaults to None.
+            predictions_scores (Optional[np.ndarray], optional): Scores of predicted bounding boxes, shape: (n_images, m_pred_boxes).
+                Defaults to None.
+            references_labels (Optional[np.ndarray], optional): Labels of predicted bounding boxes, shape: (n_images, l_pred_boxes).
+                Defaults to None.
+        """
+        if predictions_labels is None:
+            predictions_labels = [None]*len(predictions)
+        if predictions_scores is None:
+            predictions_scores = [None]*len(predictions)
+        if references_labels is None:
+            references_labels = [None]*len(references)
+        for pred, ref, pred_score, pred_l, ref_l in zip(predictions,
+                                                        references,
+                                                        predictions_scores,
+                                                        predictions_labels,
+                                                        references_labels):
+            preds, targets = self.process_preds_references(pred, ref, pred_l, pred_score, ref_l)
+            self.coco_metric.update(preds, targets)
+        super(evaluate.Metric, self).add_batch(predictions=predictions, references=references)
+    def _compute(
+            self,
+            predictions,
+            references
+        ):
+        """Returns the scores"""
+        result = self.coco_metric.compute()["metrics"]
+        return result
+    @staticmethod
+    def process_preds_references(
+            predictions,
+            references,
+            predictions_labels: Optional[np.ndarray] = None,
+            predictions_scores: Optional[np.ndarray] = None,
+            references_labels: Optional[np.ndarray] = None
+        ):
+        if predictions_scores is None:
+            predictions_scores = np.ones(shape=len(predictions), dtype=np.float32)
+        else:
+            predictions_scores = np.array(predictions_scores, dtype=np.float32)
+        if predictions_labels is None:
+            if references_labels is not None:
+                print("Warning: Providing no prediction labels, but ground truth labels!")
+            predictions_labels = np.zeros(shape=len(predictions), dtype=np.int16)
+        else:
+            predictions_labels = np.array(predictions_labels)
+        if references_labels is None:
+            references_labels = np.zeros(shape=len(references), dtype=np.int16)
+        else:
+            references_labels = np.array(references_labels)
+        preds = [
+            dict(
+                boxes=np.array(predictions),
+                scores=predictions_scores,
+                labels=predictions_labels
+            )
+        ]
+        target = [
+            dict(
+                boxes=np.array(references),
+                labels=references_labels
+            )
+        ]
+        return preds, target

modified_coco/cocoeval.py ADDED Viewed

	@@ -0,0 +1,693 @@

+__author__ = 'tsungyi, [email protected]'
+# This is a modified version of the original cocoeval.py
+# In this version we are able to return the TP, FP, and FN values
+# along with the other default metrics.
+import numpy as np
+import datetime
+import time
+from collections import defaultdict
+from pycocotools import mask as maskUtils
+import copy
+class COCOeval:
+    # Interface for evaluating detection on the Microsoft COCO dataset.
+    #
+    # The usage for CocoEval is as follows:
+    #  cocoGt=..., cocoDt=...       # load dataset and results
+    #  E = CocoEval(cocoGt,cocoDt); # initialize CocoEval object
+    #  E.params.recThrs = ...;      # set parameters as desired
+    #  E.evaluate();                # run per image evaluation
+    #  E.accumulate();              # accumulate per image results
+    #  E.summarize();               # display summary metrics of results
+    # For example usage see evalDemo.m and http://mscoco.org/.
+    #
+    # The evaluation parameters are as follows (defaults in brackets):
+    #  imgIds     - [all] N img ids to use for evaluation
+    #  catIds     - [all] K cat ids to use for evaluation
+    #  iouThrs    - [.5:.05:.95] T=10 IoU thresholds for evaluation
+    #  recThrs    - [0:.01:1] R=101 recall thresholds for evaluation
+    #  areaRng    - [...] A=4 object area ranges for evaluation
+    #  maxDets    - [1 10 100] M=3 thresholds on max detections per image
+    #  iouType    - ['segm'] set iouType to 'segm', 'bbox' or 'keypoints'
+    #  iouType replaced the now DEPRECATED useSegm parameter.
+    #  useCats    - [1] if true use category labels for evaluation
+    # Note: if useCats=0 category labels are ignored as in proposal scoring.
+    # Note: multiple areaRngs [Ax2] and maxDets [Mx1] can be specified.
+    #
+    # evaluate(): evaluates detections on every image and every category and
+    # concats the results into the "evalImgs" with fields:
+    #  dtIds      - [1xD] id for each of the D detections (dt)
+    #  gtIds      - [1xG] id for each of the G ground truths (gt)
+    #  dtMatches  - [TxD] matching gt id at each IoU or 0
+    #  gtMatches  - [TxG] matching dt id at each IoU or 0
+    #  dtScores   - [1xD] confidence of each dt
+    #  gtIgnore   - [1xG] ignore flag for each gt
+    #  dtIgnore   - [TxD] ignore flag for each dt at each IoU
+    #
+    # accumulate(): accumulates the per-image, per-category evaluation
+    # results in "evalImgs" into the dictionary "eval" with fields:
+    #  params     - parameters used for evaluation
+    #  date       - date evaluation was performed
+    #  counts     - [T,R,K,A,M] parameter dimensions (see above)
+    #  precision  - [TxRxKxAxM] precision for every evaluation setting
+    #  recall     - [TxKxAxM] max recall for every evaluation setting
+    #  TP         - [TxKxAxM] number of true positives for every eval setting  [NEW]
+    #  FP         - [TxKxAxM] number of false positives for every eval setting [NEW]
+    #  FN         - [TxKxAxM] number of false negatives for every eval setting [NEW]
+    # Note: precision and recall==-1 for settings with no gt objects.
+    #
+    # See also coco, mask, pycocoDemo, pycocoEvalDemo
+    #
+    # Microsoft COCO Toolbox.      version 2.0
+    # Data, paper, and tutorials available at:  http://mscoco.org/
+    # Code written by Piotr Dollar and Tsung-Yi Lin, 2015.
+    # Licensed under the Simplified BSD License [see coco/license.txt]
+    def __init__(self, cocoGt=None, cocoDt=None, iouType='segm'):
+        '''
+        Initialize CocoEval using coco APIs for gt and dt
+        :param cocoGt: coco object with ground truth annotations
+        :param cocoDt: coco object with detection results
+        :return: None
+        '''
+        if not iouType:
+            print('iouType not specified. use default iouType segm')
+        self.cocoGt   = cocoGt              # ground truth COCO API
+        self.cocoDt   = cocoDt              # detections COCO API
+        self.evalImgs = defaultdict(list)   # per-image per-category evaluation results [KxAxI] elements
+        self.eval     = {}                  # accumulated evaluation results
+        self._gts = defaultdict(list)       # gt for evaluation
+        self._dts = defaultdict(list)       # dt for evaluation
+        self.params = Params(iouType=iouType) # parameters
+        self._paramsEval = {}               # parameters for evaluation
+        self.stats = []                     # result summarization
+        self.ious = {}                      # ious between all gts and dts
+        if not cocoGt is None:
+            self.params.imgIds = sorted(cocoGt.getImgIds())
+            self.params.catIds = sorted(cocoGt.getCatIds())
+    def _prepare(self):
+        '''
+        Prepare ._gts and ._dts for evaluation based on params
+        :return: None
+        '''
+        def _toMask(anns, coco):
+            # modify ann['segmentation'] by reference
+            for ann in anns:
+                rle = coco.annToRLE(ann)
+                ann['segmentation'] = rle
+        p = self.params
+        if p.useCats:
+            gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
+            dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
+        else:
+            gts=self.cocoGt.loadAnns(self.cocoGt.getAnnIds(imgIds=p.imgIds))
+            dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds))
+        # convert ground truth to mask if iouType == 'segm'
+        if p.iouType == 'segm':
+            _toMask(gts, self.cocoGt)
+            _toMask(dts, self.cocoDt)
+        # set ignore flag
+        for gt in gts:
+            gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
+            gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
+            if p.iouType == 'keypoints':
+                gt['ignore'] = (gt['num_keypoints'] == 0) or gt['ignore']
+        self._gts = defaultdict(list)       # gt for evaluation
+        self._dts = defaultdict(list)       # dt for evaluation
+        for gt in gts:
+            self._gts[gt['image_id'], gt['category_id']].append(gt)
+        for dt in dts:
+            self._dts[dt['image_id'], dt['category_id']].append(dt)
+        self.evalImgs = defaultdict(list)   # per-image per-category evaluation results
+        self.eval     = {}                  # accumulated evaluation results
+    def evaluate(self):
+        '''
+        Run per image evaluation on given images and store results (a list of dict) in self.evalImgs
+        :return: None
+        '''
+        tic = time.time()
+        print('Running per image evaluation...')
+        p = self.params
+        # add backward compatibility if useSegm is specified in params
+        if not p.useSegm is None:
+            p.iouType = 'segm' if p.useSegm == 1 else 'bbox'
+            print('useSegm (deprecated) is not None. Running {} evaluation'.format(p.iouType))
+        print('Evaluate annotation type *{}*'.format(p.iouType))
+        p.imgIds = list(np.unique(p.imgIds))
+        if p.useCats:
+            p.catIds = list(np.unique(p.catIds))
+        p.maxDets = sorted(p.maxDets)
+        self.params=p
+        self._prepare()
+        # loop through images, area range, max detection number
+        catIds = p.catIds if p.useCats else [-1]
+        if p.iouType == 'segm' or p.iouType == 'bbox':
+            computeIoU = self.computeIoU
+        elif p.iouType == 'keypoints':
+            computeIoU = self.computeOks
+        self.ious = {(imgId, catId): computeIoU(imgId, catId) \
+                        for imgId in p.imgIds
+                        for catId in catIds}
+        evaluateImg = self.evaluateImg
+        maxDet = p.maxDets[-1]
+        self.evalImgs = [evaluateImg(imgId, catId, areaRng, maxDet)
+                 for catId in catIds
+                 for areaRng in p.areaRng
+                 for imgId in p.imgIds
+             ]
+        self._paramsEval = copy.deepcopy(self.params)
+        toc = time.time()
+        print('DONE (t={:0.2f}s).'.format(toc-tic))
+    def computeIoU(self, imgId, catId):
+        p = self.params
+        if p.useCats:
+            gt = self._gts[imgId,catId]
+            dt = self._dts[imgId,catId]
+        else:
+            gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
+            dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
+        if len(gt) == 0 and len(dt) ==0:
+            return []
+        inds = np.argsort([-d['score'] for d in dt], kind='mergesort')
+        dt = [dt[i] for i in inds]
+        if len(dt) > p.maxDets[-1]:
+            dt=dt[0:p.maxDets[-1]]
+        if p.iouType == 'segm':
+            g = [g['segmentation'] for g in gt]
+            d = [d['segmentation'] for d in dt]
+        elif p.iouType == 'bbox':
+            g = [g['bbox'] for g in gt]
+            d = [d['bbox'] for d in dt]
+        else:
+            raise Exception('unknown iouType for iou computation')
+        # compute iou between each dt and gt region
+        iscrowd = [int(o['iscrowd']) for o in gt]
+        ious = maskUtils.iou(d,g,iscrowd)
+        return ious
+    def computeOks(self, imgId, catId):
+        p = self.params
+        # dimention here should be Nxm
+        gts = self._gts[imgId, catId]
+        dts = self._dts[imgId, catId]
+        inds = np.argsort([-d['score'] for d in dts], kind='mergesort')
+        dts = [dts[i] for i in inds]
+        if len(dts) > p.maxDets[-1]:
+            dts = dts[0:p.maxDets[-1]]
+        # if len(gts) == 0 and len(dts) == 0:
+        if len(gts) == 0 or len(dts) == 0:
+            return []
+        ious = np.zeros((len(dts), len(gts)))
+        sigmas = p.kpt_oks_sigmas
+        vars = (sigmas * 2)**2
+        k = len(sigmas)
+        # compute oks between each detection and ground truth object
+        for j, gt in enumerate(gts):
+            # create bounds for ignore regions(double the gt bbox)
+            g = np.array(gt['keypoints'])
+            xg = g[0::3]; yg = g[1::3]; vg = g[2::3]
+            k1 = np.count_nonzero(vg > 0)
+            bb = gt['bbox']
+            x0 = bb[0] - bb[2]; x1 = bb[0] + bb[2] * 2
+            y0 = bb[1] - bb[3]; y1 = bb[1] + bb[3] * 2
+            for i, dt in enumerate(dts):
+                d = np.array(dt['keypoints'])
+                xd = d[0::3]; yd = d[1::3]
+                if k1>0:
+                    # measure the per-keypoint distance if keypoints visible
+                    dx = xd - xg
+                    dy = yd - yg
+                else:
+                    # measure minimum distance to keypoints in (x0,y0) & (x1,y1)
+                    z = np.zeros((k))
+                    dx = np.max((z, x0-xd),axis=0)+np.max((z, xd-x1),axis=0)
+                    dy = np.max((z, y0-yd),axis=0)+np.max((z, yd-y1),axis=0)
+                e = (dx**2 + dy**2) / vars / (gt['area']+np.spacing(1)) / 2
+                if k1 > 0:
+                    e=e[vg > 0]
+                ious[i, j] = np.sum(np.exp(-e)) / e.shape[0]
+        return ious
+    def is_bbox1_inside_bbox2(self, bbox1, bbox2):
+        '''
+        Check if bbox1 is inside bbox2. Bbox is in the format [x, y, w, h]
+        Returns:
+            - True if bbox1 is inside bbox2, False otherwise
+            - How much bbox1 is inside bbox2 (number between 0 and 1)
+        '''
+        x1_1, y1_1, w1_1, h1_1 = bbox1
+        x1_2, y1_2, w1_2, h1_2 = bbox2
+        # Convert xywh to (x, y, x2, y2) format
+        x2_1, y2_1 = x1_1 + w1_1, y1_1 + h1_1
+        x2_2, y2_2 = x1_2 + w1_2, y1_2 + h1_2
+        # Calculate the coordinates of the intersection rectangle
+        x_left, y_top = max(x1_1, x1_2), max(y1_1, y1_2)
+        x_right, y_bottom = min(x2_1, x2_2), min(y2_1, y2_2)
+        print(f"{x_left=}, {x_right=}, {y_top=}, {y_bottom=}")
+        if x_right < x_left or y_bottom < y_top:
+            return False, 0
+        intersection_area = (x_right - x_left) * (y_bottom - y_top)
+        print(f"{intersection_area=}")
+        return True, intersection_area / (w1_1 * h1_1)
+    def evaluateImg(self, imgId, catId, aRng, maxDet):
+        '''
+        perform evaluation for single category and image
+        :return: dict (single image results)
+        '''
+        p = self.params
+        if p.useCats:
+            gt = self._gts[imgId,catId]
+            dt = self._dts[imgId,catId]
+        else:
+            gt = [_ for cId in p.catIds for _ in self._gts[imgId,cId]]
+            dt = [_ for cId in p.catIds for _ in self._dts[imgId,cId]]
+        if len(gt) == 0 and len(dt) ==0:
+            return None
+        for g in gt:
+            if g['ignore'] or (g['area']<aRng[0] or g['area']>aRng[1]):
+                g['_ignore'] = 1
+            else:
+                g['_ignore'] = 0
+        # sort dt highest score first, sort gt ignore last
+        gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
+        gt = [gt[i] for i in gtind]
+        dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
+        dt = [dt[i] for i in dtind[0:maxDet]]
+        iscrowd = [int(o['iscrowd']) for o in gt]
+        # load computed ious
+        ious = self.ious[imgId, catId][:, gtind] if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId]
+        T = len(p.iouThrs)
+        G = len(gt)
+        D = len(dt)
+        gtm  = np.zeros((T,G))
+        dtm  = np.zeros((T,D))
+        gtIg = np.array([g['_ignore'] for g in gt])
+        dtIg = np.zeros((T,D))
+        dtDup = np.zeros((T,D))
+        if not len(ious)==0:
+            for tind, t in enumerate(p.iouThrs):
+                for dind, d in enumerate(dt):
+                    # information about best match so far (m=-1 -> unmatched)
+                    iou = min([t,1-1e-10])
+                    m   = -1
+                    for gind, g in enumerate(gt):
+                        # if this gt already matched, iou>iouThr, and not a crowd
+                        # store detection as duplicate
+                        if gtm[tind,gind]>0 and ious[dind,gind]>t and not iscrowd[gind]:
+                            dtDup[tind, dind] = d['id']
+                        # if this gt already matched, and not a crowd, continue
+                        if gtm[tind,gind]>0 and not iscrowd[gind]:
+                            continue
+                        # if dt matched to reg gt, and on ignore gt, stop
+                        if m > -1 and gtIg[m]==0 and gtIg[gind]==1:
+                            break
+                        # continue to next gt unless better match made
+                        if ious[dind,gind] < iou:
+                            continue
+                        # if match successful and best so far, store appropriately
+                        iou=ious[dind,gind]
+                        m=gind
+                    # if match made store id of match for both dt and gt
+                    if m ==-1:
+                        continue
+                    dtIg[tind,dind] = gtIg[m]
+                    dtm[tind,dind]  = gt[m]['id']
+                    gtm[tind,m]     = d['id']
+        # set unmatched detections outside of area range to ignore
+        a = np.array([d['area']<aRng[0] or d['area']>aRng[1] for d in dt]).reshape((1, len(dt)))
+        dtIg = np.logical_or(dtIg, np.logical_and(dtm==0, np.repeat(a,T,0)))
+        # only consider duplicates if dets are inside the area range
+        dtDup = np.logical_and(dtDup, np.logical_and(dtm==0, np.logical_not(np.repeat(a,T,0))))
+        # false positive img (fpi) when all gt are ignored and there remain detections
+        fpi = (gtIg.sum() == G) and np.any(dtIg == 0)
+        # store results for given image and category
+        return {
+                'image_id':     imgId,
+                'category_id':  catId,
+                'aRng':         aRng,
+                'maxDet':       maxDet,
+                'dtIds':        [d['id'] for d in dt],
+                'gtIds':        [g['id'] for g in gt],
+                'dtMatches':    dtm,
+                'gtMatches':    gtm,
+                'dtScores':     [d['score'] for d in dt],
+                'gtIgnore':     gtIg,
+                'dtIgnore':     dtIg,
+                'dtDuplicates': dtDup,
+                'fpi':          fpi,
+            }
+    def accumulate(self, p = None):
+        '''
+        Accumulate per image evaluation results and store the result in self.eval
+        :param p: input params for evaluation
+        :return: None
+        '''
+        print('Accumulating evaluation results...')
+        tic = time.time()
+        if not self.evalImgs:
+            print('Please run evaluate() first')
+        # allows input customized parameters
+        if p is None:
+            p = self.params
+        p.catIds = p.catIds if p.useCats == 1 else [-1]
+        T           = len(p.iouThrs)
+        R           = len(p.recThrs)
+        K           = len(p.catIds) if p.useCats else 1
+        A           = len(p.areaRng)
+        M           = len(p.maxDets)
+        precision   = -np.ones((T,R,K,A,M)) # -1 for the precision of absent categories
+        recall      = -np.ones((T,K,A,M))
+        scores      = -np.ones((T,R,K,A,M))
+        TP          = -np.ones((T,K,A,M))
+        FP          = -np.ones((T,K,A,M))
+        FN          = -np.ones((T,K,A,M))
+        duplicates  = -np.ones((T,K,A,M))
+        FPI         = -np.ones((T,K,A,M))
+        # matrix of arrays
+        TPC         = np.empty((T,K,A,M), dtype=object)
+        FPC         = np.empty((T,K,A,M), dtype=object)
+        sorted_conf = np.empty((K,A,M), dtype=object)
+        # create dictionary for future indexing
+        _pe = self._paramsEval
+        catIds = _pe.catIds if _pe.useCats else [-1]
+        setK = set(catIds)
+        setA = set(map(tuple, _pe.areaRng))
+        setM = set(_pe.maxDets)
+        setI = set(_pe.imgIds)
+        # get inds to evaluate
+        k_list = [n for n, k in enumerate(p.catIds)  if k in setK]
+        m_list = [m for n, m in enumerate(p.maxDets) if m in setM]
+        a_list = [n for n, a in enumerate(map(lambda x: tuple(x), p.areaRng)) if a in setA]
+        i_list = [n for n, i in enumerate(p.imgIds)  if i in setI]
+        I0 = len(_pe.imgIds)
+        A0 = len(_pe.areaRng)
+        # retrieve E at each category, area range, and max number of detections
+        for k, k0 in enumerate(k_list):
+            Nk = k0*A0*I0
+            for a, a0 in enumerate(a_list):
+                Na = a0*I0
+                for m, maxDet in enumerate(m_list):
+                    E = [self.evalImgs[Nk + Na + i] for i in i_list]
+                    E = [e for e in E if not e is None]
+                    if len(E) == 0:
+                        continue
+                    dtScores = np.concatenate([e['dtScores'][0:maxDet] for e in E])
+                    # different sorting method generates slightly different results.
+                    # mergesort is used to be consistent as Matlab implementation.
+                    inds = np.argsort(-dtScores, kind='mergesort')
+                    dtScoresSorted = dtScores[inds]
+                    sorted_conf[k,a,m] = dtScoresSorted.copy()
+                    dtm  = np.concatenate([e['dtMatches'][:,0:maxDet] for e in E], axis=1)[:,inds]
+                    dtIg = np.concatenate([e['dtIgnore'][:,0:maxDet]  for e in E], axis=1)[:,inds]
+                    dtDups = np.concatenate([e['dtDuplicates'][:,0:maxDet] for e in E], axis=1)[:,inds]
+                    gtIg = np.concatenate([e['gtIgnore'] for e in E])
+                    npig = np.count_nonzero(gtIg==0) # number of not ignored gt objects
+                    fpi = np.array([e['fpi'] for e in E]) # false positive image (no gt objects)
+                    # if npig == 0:
+                    #     print("No ground truth objects, continuing...")
+                    #     continue
+                    tps = np.logical_and(               dtm,  np.logical_not(dtIg) )
+                    fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg) )
+                    tp_sum = np.cumsum(tps, axis=1).astype(dtype=float)
+                    fp_sum = np.cumsum(fps, axis=1).astype(dtype=float)
+                    fpi_sum = np.cumsum(fpi).astype(dtype=int)
+                    for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
+                        tp = np.array(tp)
+                        fp = np.array(fp)
+                        fn = npig - tp # difference between gt and tp
+                        nd = len(tp)
+                        rc = tp / npig if npig else [0]
+                        pr = tp / (fp+tp+np.spacing(1))
+                        q  = np.zeros((R,))
+                        ss = np.zeros((R,)) #
+                        if nd:
+                            recall[t,k,a,m] = rc[-1]
+                        else:
+                            recall[t,k,a,m] = 0
+                        TP[t,k,a,m] = tp[-1] if nd else 0
+                        FP[t,k,a,m] = fp[-1] if nd else 0
+                        FN[t,k,a,m] = fn[-1] if nd else npig
+                        duplicates[t,k,a,m] = np.sum(dtDups[t, :])
+                        FPI[t,k,a,m] = fpi_sum[-1]
+                        TPC[t,k,a,m] = tp.copy()
+                        FPC[t,k,a,m] = fp.copy()
+                        # numpy is slow without cython optimization for accessing elements
+                        # use python array gets significant speed improvement
+                        pr = pr.tolist(); q = q.tolist()
+                        for i in range(nd-1, 0, -1):
+                            if pr[i] > pr[i-1]:
+                                pr[i-1] = pr[i]
+                        inds = np.searchsorted(rc, p.recThrs, side='left')
+                        try:
+                            for ri, pi in enumerate(inds):
+                                q[ri] = pr[pi]
+                                ss[ri] = dtScoresSorted[pi]
+                        except:
+                            pass
+                        precision[t,:,k,a,m] = np.array(q)
+                        scores[t,:,k,a,m] = np.array(ss)
+        self.eval = {
+            'params': p,
+            'counts': [T, R, K, A, M],
+            'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
+            'precision': precision,
+            'recall':   recall,
+            'scores': scores,
+            'TP': TP,
+            'FP': FP,
+            'FN': FN,
+            'duplicates': duplicates,
+            'support': TP + FN,
+            'FPI': FPI,
+            'TPC': TPC,
+            'FPC': FPC,
+            'sorted_conf': sorted_conf,
+        }
+        toc = time.time()
+        print('DONE (t={:0.2f}s).'.format( toc-tic))
+    def summarize(self):
+        results = {}
+        max_dets = self.params.maxDets[-1]
+        min_iou = self.params.iouThrs[0]
+        results['params'] = self.params
+        results['eval'] = self.eval
+        results['metrics'] = {}
+        # for area_lbl in self.params.areaRngLbl:
+        #     results.append(self._summarize('ap', iouThr=min_iou,
+        #                  areaRng=area_lbl, maxDets=max_dets))
+        # for area_lbl in self.params.areaRngLbl:
+        #     results.append(self._summarize('ar', iouThr=min_iou,
+        #                  areaRng=area_lbl, maxDets=max_dets))
+        metrics_str = f"{'tp':>6}, {'fp':>6}, {'fn':>6}, {'dup':>6}, "
+        metrics_str += f"{'pr':>5.2}, {'rec':>5.2}, {'f1':>5.2}, {'supp':>6}"
+        metrics_str += f", {'fpi':>6}, {'nImgs':>6}"
+        print('{:>51} {}'.format('METRIC', metrics_str))
+        for area_lbl in self.params.areaRngLbl:
+            results['metrics'][area_lbl] = self._summarize(
+                'pr_rec_f1',
+                iouThr=min_iou,
+                areaRng=area_lbl,
+                maxDets=max_dets
+            )
+        return results
+    def _summarize(self, metric_type='ap', iouThr=None, areaRng='all', maxDets=100):
+        """
+        Helper function to print and obtain metrics of types:
+        - ap: average precision
+        - ar: average recall
+        - cf: tp, fp, fn, precision, recall, f1
+        values from COCOeval object
+        """
+        def _summarize_ap_ar(ap=1, iouThr=None, areaRng='all', maxDets=100):
+            iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}'
+            titleStr = 'Average Precision' if ap == 1 else 'Average Recall'
+            typeStr = '(AP)' if ap == 1 else '(AR)'
+            iouStr = '{:0.2f}:{:0.2f}'.format(p.iouThrs[0], p.iouThrs[-1]) \
+                if iouThr is None else '{:0.2f}'.format(iouThr)
+            aind = [i for i, aRng in enumerate(
+                p.areaRngLbl) if aRng == areaRng]
+            mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
+            if ap == 1:
+                # dimension of precision: [TxRxKxAxM]
+                s = self.eval['precision']
+                # IoU
+                if iouThr is not None:
+                    t = np.where(iouThr == p.iouThrs)[0]
+                    s = s[t]
+                s = s[:, :, :, aind, mind]
+            else:
+                # dimension of recall: [TxKxAxM]
+                s = self.eval['recall']
+                if iouThr is not None:
+                    t = np.where(iouThr == p.iouThrs)[0]
+                    s = s[t]
+                s = s[:, :, aind, mind]
+            if len(s[s > -1]) == 0:
+                mean_s = -1
+            else:
+                mean_s = np.mean(s[s > -1])
+            print(iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s))
+            return mean_s
+        def _summarize_pr_rec_f1(iouThr=None, areaRng='all', maxDets=100):
+            aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
+            mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
+            # dimension of TP, FP, FN [TxKxAxM]
+            tp = self.eval['TP']
+            fp = self.eval['FP']
+            fn = self.eval['FN']
+            dup = self.eval['duplicates']
+            fpi = self.eval['FPI']
+            nImgs = len(p.imgIds)
+            # filter by IoU
+            if iouThr is not None:
+                t = np.where(iouThr == p.iouThrs)[0]
+                tp, fp, fn = tp[t], fp[t], fn[t]
+                dup = dup[t]
+                fpi = fpi[t]
+            # filter by area and maxDets
+            tp = tp[:, :, aind, mind].squeeze()
+            fp = fp[:, :, aind, mind].squeeze()
+            fn = fn[:, :, aind, mind].squeeze()
+            dup = dup[:, :, aind, mind].squeeze()
+            fpi = fpi[:, :, aind, mind].squeeze()
+            # handle case where tp, fp, fn and dup are empty (no gt and no dt)
+            if all([not np.any(m) for m in [tp, fp, fn, dup, fpi]]):
+                tp, fp, fn, dup, fpi =[-1] * 5
+            else:
+                tp, fp, fn, dup, fpi = [e.item() for e in [tp, fp, fn, dup, fpi]]
+            # compute precision, recall, f1
+            if tp == -1 and fp == -1 and fn == -1:
+                pr, rec, f1 = -1, -1, -1
+                support, fpi = 0, 0
+            else:
+                pr = 0 if tp + fp == 0 else tp / (tp + fp)
+                rec = 0 if tp + fn == 0 else tp / (tp + fn)
+                f1 = 0 if pr + rec == 0 else 2 * pr * rec / (pr + rec)
+                support = tp + fn
+            # print(f"{tp=}, {fp=}, {fn=}, {dup=}, {pr=}, {rec=}, {f1=}, {support=}, {fpi=}")
+            iStr = '@[ IoU={:<9} | area={:>9s} | maxDets={:>3d} ] = {}'
+            iouStr = '{:0.2f}:{:0.2f}'.format(p.iouThrs[0], p.iouThrs[-1]) \
+                if iouThr is None else '{:0.2f}'.format(iouThr)
+            metrics_str = f"{tp:>6.0f}, {fp:>6.0f}, {fn:>6.0f}, {dup:>6.0f}, "
+            metrics_str += f"{pr:>5.2f}, {rec:>5.2f}, {f1:>5.2f}, {support:>6.0f}, "
+            metrics_str += f"{fpi:>6.0f}, {nImgs:>6.0f}"
+            print(iStr.format(iouStr, areaRng, maxDets, metrics_str))
+            return {
+                'range': p.areaRng[aind[0]],
+                'iouThr': iouStr,
+                'maxDets': maxDets,
+                'tp': int(tp),
+                'fp': int(fp),
+                'fn': int(fn),
+                'duplicates': int(dup),
+                'precision': pr,
+                'recall': rec,
+                'f1': f1,
+                'support': int(support),
+                'fpi': int(fpi),
+                'nImgs': nImgs,
+            }
+        p = self.params
+        if metric_type in ['ap', 'ar']:
+            ap = 1 if metric_type == 'ap' else 0
+            return _summarize_ap_ar(ap, iouThr=iouThr, areaRng=areaRng, maxDets=maxDets)
+        # return tp, fp, fn, pr, rec, f1, support, fpi, nImgs
+        return _summarize_pr_rec_f1(iouThr=iouThr, areaRng=areaRng, maxDets=maxDets)
+    def __str__(self):
+        self.summarize()
+class Params:
+    '''
+    Params for coco evaluation api
+    '''
+    def setDetParams(self):
+        self.imgIds = []
+        self.catIds = []
+        # np.arange causes trouble.  the data point on arange is slightly larger than the true value
+        self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
+        self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)
+        self.maxDets = [1, 10, 100]
+        self.areaRng = [[0 ** 2, 1e5 ** 2], [0 ** 2, 32 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
+        self.areaRngLbl = ['all', 'small', 'medium', 'large']
+        self.useCats = 1
+    def setKpParams(self):
+        self.imgIds = []
+        self.catIds = []
+        # np.arange causes trouble.  the data point on arange is slightly larger than the true value
+        self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
+        self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)
+        self.maxDets = [20]
+        self.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
+        self.areaRngLbl = ['all', 'medium', 'large']
+        self.useCats = 1
+        self.kpt_oks_sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62,.62, 1.07, 1.07, .87, .87, .89, .89])/10.0
+    def __init__(self, iouType='segm'):
+        if iouType == 'segm' or iouType == 'bbox':
+            self.setDetParams()
+        elif iouType == 'keypoints':
+            self.setKpParams()
+        else:
+            raise Exception('iouType not supported')
+        self.iouType = iouType
+        # useSegm is deprecated
+        self.useSegm = None
+    def __repr__(self) -> str:
+        return str(self.__dict__)
+    def __iter__(self):
+        return iter(self.__dict__.items())

modified_coco/pr_rec_f1.py ADDED Viewed

	@@ -0,0 +1,620 @@

+# Copyright The PyTorch Lightning team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# NOTE: This metric is based on torchmetrics.detection.mean_ap and
+# then modified to support the evaluation of precision, recall, f1 and support
+# for object detection. It can also be used to evaluate the mean average precision
+# but some modifications are needed. Additionally, numpy is used instead of torch
+import contextlib
+import io
+import json
+from typing import Any, Callable, Dict, List, Optional, Tuple, Union
+from typing_extensions import Literal
+import numpy as np
+from modified_coco.utils import _fix_empty_arrays, _input_validator, box_convert
+try:
+    import pycocotools.mask as mask_utils
+    from pycocotools.coco import COCO
+    # from pycocotools.cocoeval import COCOeval
+    from modified_coco.cocoeval import COCOeval  # use our own version of COCOeval
+except ImportError:
+    raise ModuleNotFoundError(
+        "`MAP` metric requires that `pycocotools` installed."
+        " Please install with `pip install pycocotools`"
+    )
+class PrecisionRecallF1Support:
+    r"""Compute the Precision, Recall, F1 and Support scores for object detection.
+    - Precision = :math:`\frac{TP}{TP + FP}`
+    - Recall = :math:`\frac{TP}{TP + FN}`
+    - F1 = :math:`\frac{2 * Precision * Recall}{Precision + Recall}`
+    - Support = :math:`TP + FN`
+    As input to ``forward`` and ``update`` the metric accepts the following input:
+    - ``preds`` (:class:`~List`): A list consisting of dictionaries each containing the key-values
+      (each dictionary corresponds to a single image). Parameters that should be provided per dict:
+        - boxes: (:class:`~np.ndarray`) of shape ``(num_boxes, 4)`` containing ``num_boxes``
+        detection boxes of the format specified in the constructor. By default, this method expects
+        ``(xmin, ymin, xmax, ymax)`` in absolute image coordinates.
+        - scores: :class:`~np.ndarray` of shape ``(num_boxes)`` containing detection scores
+        for the boxes.
+        - labels: :class:`~np.ndarray` of shape ``(num_boxes)`` containing 0-indexed detection
+        classes for the boxes.
+        - masks: :class:`~torch.bool` of shape ``(num_boxes, image_height, image_width)`` containing
+        boolean masks. Only required when `iou_type="segm"`.
+    - ``target`` (:class:`~List`) A list consisting of dictionaries each containing the key-values
+      (each dictionary corresponds to a single image). Parameters that should be provided per dict:
+        - boxes: :class:`~np.ndarray` of shape ``(num_boxes, 4)`` containing ``num_boxes``
+        ground truth boxes of the format specified in the constructor. By default, this method
+        expects ``(xmin, ymin, xmax, ymax)`` in absolute image coordinates.
+        - labels: :class:`~np.ndarray` of shape ``(num_boxes)`` containing 0-indexed ground
+        truth classes for the boxes.
+        - masks: :class:`~torch.bool` of shape ``(num_boxes, image_height, image_width)``
+        containing boolean masks. Only required when `iou_type="segm"`.
+        - iscrowd: :class:`~np.ndarray` of shape ``(num_boxes)`` containing 0/1 values
+        indicating whether the bounding box/masks indicate a crowd of objects. Value is optional,
+        and if not provided it will automatically be set to 0.
+        - area: :class:`~np.ndarray` of shape ``(num_boxes)`` containing the area of the
+        object. Value if optional, and if not provided will be automatically calculated based
+        on the bounding box/masks provided. Only affects when 'area_ranges' is provided.
+    As output of ``forward`` and ``compute`` the metric returns the following output:
+    - ``results``: A dictionary containing the following key-values:
+        - ``params``: COCOeval parameters object
+        - ``eval``: output of COCOeval.accumuate()
+        - ``metrics``: A dictionary containing the following key-values for each area range:
+            - ``area_range``: str containing the area range
+            - ``iouThr``: str containing the IoU threshold
+            - ``maxDets``: int containing the maximum number of detections
+            - ``tp``: int containing the number of true positives
+            - ``fp``: int containing the number of false positives
+            - ``fn``: int containing the number of false negatives
+            - ``precision``: float containing the precision
+            - ``recall``: float containing the recall
+            - ``f1``: float containing the f1 score
+            - ``support``: int containing the support (tp + fn)
+    .. note::
+        This metric utilizes the official `pycocotools` implementation as its backend. This means that the metric
+        requires you to have `pycocotools` installed. In addition we require `torchvision` version 0.8.0 or newer.
+        Please install with ``pip install torchmetrics[detection]``.
+    Args:
+        box_format:
+            Input format of given boxes. Supported formats are ``[xyxy, xywh, cxcywh]``.
+        iou_type:
+            Type of input (either masks or bounding-boxes) used for computing IOU.
+            Supported IOU types are ``["bbox", "segm"]``. If using ``"segm"``, masks should be provided in input.
+        iou_thresholds:
+            IoU thresholds for evaluation. If set to ``None`` it corresponds to the stepped range ``[0.5,...,0.95]``
+            with step ``0.05``. Else provide a list of floats.
+        rec_thresholds:
+            Recall thresholds for evaluation. If set to ``None`` it corresponds to the stepped range ``[0,...,1]``
+            with step ``0.01``. Else provide a list of floats.
+        max_detection_thresholds:
+            Thresholds on max detections per image. If set to `None` will use thresholds ``[100]``.
+            Else, please provide a list of ints.
+        area_ranges:
+            Area ranges for evaluation. If set to ``None`` it corresponds to the ranges ``[[0^2, 1e5^2]]``.
+            Else, please provide a list of lists of length 2.
+        area_ranges_labels:
+            Labels for the area ranges. If set to ``None`` it corresponds to the labels ``["all"]``.
+            Else, please provide a list of strings of the same length as ``area_ranges``.
+        class_agnostic:
+            If ``True`` will compute metrics globally. If ``False`` will compute metrics per class.
+            Default: ``True`` (per class metrics are not supported yet)
+        debug:
+            If ``True`` will print the COCOEval summary to stdout.
+        kwargs: Additional keyword arguments, see :ref:`Metric kwargs` for more info.
+    Raises:
+        ValueError:
+            If ``box_format`` is not one of ``"xyxy"``, ``"xywh"`` or ``"cxcywh"``
+        ValueError:
+            If ``iou_type`` is not one of ``"bbox"`` or ``"segm"``
+        ValueError:
+            If ``iou_thresholds`` is not None or a list of floats
+        ValueError:
+            If ``rec_thresholds`` is not None or a list of floats
+        ValueError:
+            If ``max_detection_thresholds`` is not None or a list of ints
+        ValueError:
+            If ``area_ranges`` is not None or a list of lists of length 2
+        ValueError:
+            If ``area_ranges_labels`` is not None or a list of strings
+    Example:
+        >>> import numpy as np
+        >>> from metrics.detection import MeanAveragePrecision
+        >>> preds = [
+        ...   dict(
+        ...     boxes=np.array([[258.0, 41.0, 606.0, 285.0]]),
+        ...     scores=np.array([0.536]),
+        ...     labels=np.array([0]),
+        ...   )
+        ... ]
+        >>> target = [
+        ...   dict(
+        ...     boxes=np.array([[214.0, 41.0, 562.0, 285.0]]),
+        ...     labels=np.array([0]),
+        ...   )
+        ... ]
+        >>> metric = PrecisionRecallF1Support()
+        >>> metric.update(preds, target)
+        >>> print(metric.compute())
+        {'params': <metrics.detection.cocoeval.Params at 0x16dc99150>,
+         'eval': ... output of COCOeval.accumuate(),
+         'metrics': {'all': {'range': [0, 10000000000.0],
+         'iouThr': '0.50',
+         'maxDets': 100,
+         'tp': 1,
+         'fp': 0,
+         'fn': 0,
+         'precision': 1.0,
+         'recall': 1.0,
+         'f1': 1.0,
+         'support': 1}}}
+    """
+    is_differentiable: bool = False
+    higher_is_better: Optional[bool] = True
+    full_state_update: bool = True
+    plot_lower_bound: float = 0.0
+    plot_upper_bound: float = 1.0
+    detections: List[np.ndarray]
+    detection_scores: List[np.ndarray]
+    detection_labels: List[np.ndarray]
+    groundtruths: List[np.ndarray]
+    groundtruth_labels: List[np.ndarray]
+    groundtruth_crowds: List[np.ndarray]
+    groundtruth_area: List[np.ndarray]
+    def __init__(
+        self,
+        box_format: str = "xyxy",
+        iou_type: Literal["bbox", "segm"] = "bbox",
+        iou_thresholds: Optional[List[float]] = None,
+        rec_thresholds: Optional[List[float]] = None,
+        max_detection_thresholds: Optional[List[int]] = None,
+        area_ranges: Optional[List[List[int]]] = None,
+        area_ranges_labels: Optional[List[str]] = None,
+        class_agnostic: bool = True,
+        debug: bool = False,
+        **kwargs: Any,
+    ) -> None:
+        allowed_box_formats = ("xyxy", "xywh", "cxcywh")
+        if box_format not in allowed_box_formats:
+            raise ValueError(
+                f"Expected argument `box_format` to be one of {allowed_box_formats} but got {box_format}")
+        self.box_format = box_format
+        allowed_iou_types = ("segm", "bbox")
+        if iou_type not in allowed_iou_types:
+            raise ValueError(
+                f"Expected argument `iou_type` to be one of {allowed_iou_types} but got {iou_type}")
+        self.iou_type = iou_type
+        if iou_thresholds is not None and not isinstance(iou_thresholds, list):
+            raise ValueError(
+                f"Expected argument `iou_thresholds` to either be `None` or a list of floats but got {iou_thresholds}"
+            )
+        self.iou_thresholds = iou_thresholds or np.linspace(
+            0.5, 0.95, round((0.95 - 0.5) / 0.05) + 1).tolist()
+        if rec_thresholds is not None and not isinstance(rec_thresholds, list):
+            raise ValueError(
+                f"Expected argument `rec_thresholds` to either be `None` or a list of floats but got {rec_thresholds}"
+            )
+        self.rec_thresholds = rec_thresholds or np.linspace(
+            0.0, 1.00, round(1.00 / 0.01) + 1).tolist()
+        if max_detection_thresholds is not None and not isinstance(max_detection_thresholds, list):
+            raise ValueError(
+                f"Expected argument `max_detection_thresholds` to either be `None` or a list of ints"
+                f" but got {max_detection_thresholds}"
+            )
+        max_det_thr = np.sort(np.array(
+            max_detection_thresholds or [100], dtype=np.uint))
+        self.max_detection_thresholds = max_det_thr.tolist()
+        # check area ranges
+        if area_ranges is not None:
+            if not isinstance(area_ranges, list):
+                raise ValueError(
+                    f"Expected argument `area_ranges` to either be `None` or a list of lists but got {area_ranges}"
+                )
+            for area_range in area_ranges:
+                if not isinstance(area_range, list) or len(area_range) != 2:
+                    raise ValueError(
+                        f"Expected argument `area_ranges` to be a list of lists of length 2 but got {area_ranges}"
+                    )
+        self.area_ranges = area_ranges if area_ranges is not None else [
+            [0**2, 1e5**2]]
+        if area_ranges_labels is not None:
+            if area_ranges is None:
+                raise ValueError(
+                    "Expected argument `area_ranges_labels` to be `None` if `area_ranges` is not provided"
+                )
+            if not isinstance(area_ranges_labels, list):
+                raise ValueError(
+                    f"Expected argument `area_ranges_labels` to either be `None` or a list of strings"
+                    f" but got {area_ranges_labels}"
+                )
+            if len(area_ranges_labels) != len(area_ranges):
+                raise ValueError(
+                    f"Expected argument `area_ranges_labels` to be a list of length {len(area_ranges)}"
+                    f" but got {area_ranges_labels}"
+                )
+        self.area_ranges_labels = area_ranges_labels if area_ranges_labels is not None else [
+            "all"]
+        # if not isinstance(class_metrics, bool):
+        #     raise ValueError(
+        #         "Expected argument `class_metrics` to be a boolean")
+        # self.class_metrics = class_metrics
+        if not isinstance(class_agnostic, bool):
+            raise ValueError(
+                "Expected argument `class_agnostic` to be a boolean")
+        self.class_agnostic = class_agnostic
+        if not isinstance(debug, bool):
+            raise ValueError("Expected argument `debug` to be a boolean")
+        self.debug = debug
+        self.detections = []
+        self.detection_scores = []
+        self.detection_labels = []
+        self.groundtruths = []
+        self.groundtruth_labels = []
+        self.groundtruth_crowds = []
+        self.groundtruth_area = []
+        # self.add_state("detections", default=[], dist_reduce_fx=None)
+        # self.add_state("detection_scores", default=[], dist_reduce_fx=None)
+        # self.add_state("detection_labels", default=[], dist_reduce_fx=None)
+        # self.add_state("groundtruths", default=[], dist_reduce_fx=None)
+        # self.add_state("groundtruth_labels", default=[], dist_reduce_fx=None)
+        # self.add_state("groundtruth_crowds", default=[], dist_reduce_fx=None)
+        # self.add_state("groundtruth_area", default=[], dist_reduce_fx=None)
+    def update(self, preds: List[Dict[str, np.ndarray]], target: List[Dict[str, np.ndarray]]) -> None:
+        """Update metric state.
+        Raises:
+            ValueError:
+                If ``preds`` is not of type (:class:`~List[Dict[str, np.ndarray]]`)
+            ValueError:
+                If ``target`` is not of type ``List[Dict[str, np.ndarray]]``
+            ValueError:
+                If ``preds`` and ``target`` are not of the same length
+            ValueError:
+                If any of ``preds.boxes``, ``preds.scores`` and ``preds.labels`` are not of the same length
+            ValueError:
+                If any of ``target.boxes`` and ``target.labels`` are not of the same length
+            ValueError:
+                If any box is not type float and of length 4
+            ValueError:
+                If any class is not type int and of length 1
+            ValueError:
+                If any score is not type float and of length 1
+        """
+        _input_validator(preds, target, iou_type=self.iou_type)
+        for item in preds:
+            detections = self._get_safe_item_values(item)
+            self.detections.append(detections)
+            self.detection_labels.append(item["labels"])
+            self.detection_scores.append(item["scores"])
+        for item in target:
+            groundtruths = self._get_safe_item_values(item)
+            self.groundtruths.append(groundtruths)
+            self.groundtruth_labels.append(item["labels"])
+            self.groundtruth_crowds.append(
+                item.get("iscrowd", np.zeros_like(item["labels"])))
+            self.groundtruth_area.append(
+                item.get("area", np.zeros_like(item["labels"])))
+    def compute(self) -> dict:
+        """Computes the metric."""
+        coco_target, coco_preds = COCO(), COCO()
+        coco_target.dataset = self._get_coco_format(
+            self.groundtruths, self.groundtruth_labels, crowds=self.groundtruth_crowds, area=self.groundtruth_area
+        )
+        coco_preds.dataset = self._get_coco_format(
+            self.detections, self.detection_labels, scores=self.detection_scores)
+        with contextlib.redirect_stdout(io.StringIO()) as f:
+            coco_target.createIndex()
+            coco_preds.createIndex()
+            coco_eval = COCOeval(coco_target, coco_preds,
+                                 iouType=self.iou_type)
+            coco_eval.params.iouThrs = np.array(
+                self.iou_thresholds, dtype=np.float64)
+            coco_eval.params.recThrs = np.array(
+                self.rec_thresholds, dtype=np.float64)
+            coco_eval.params.maxDets = self.max_detection_thresholds
+            coco_eval.params.areaRng = self.area_ranges
+            coco_eval.params.areaRngLbl = self.area_ranges_labels
+            coco_eval.params.useCats = 0 if self.class_agnostic else 1
+            coco_eval.evaluate()
+            coco_eval.accumulate()
+        if self.debug:
+            print(f.getvalue())
+        metrics = coco_eval.summarize()
+        return metrics
+    @staticmethod
+    def coco_to_np(
+        coco_preds: str,
+        coco_target: str,
+        iou_type: Literal["bbox", "segm"] = "bbox",
+    ) -> Tuple[List[Dict[str, np.ndarray]], List[Dict[str, np.ndarray]]]:
+        """Utility function for converting .json coco format files to the input format of this metric.
+        The function accepts a file for the predictions and a file for the target in coco format and converts them to
+        a list of dictionaries containing the boxes, labels and scores in the input format of this metric.
+        Args:
+            coco_preds: Path to the json file containing the predictions in coco format
+            coco_target: Path to the json file containing the targets in coco format
+            iou_type: Type of input, either `bbox` for bounding boxes or `segm` for segmentation masks
+        Returns:
+            preds: List of dictionaries containing the predictions in the input format of this metric
+            target: List of dictionaries containing the targets in the input format of this metric
+        Example:
+            >>> # File formats are defined at https://cocodataset.org/#format-data
+            >>> # Example files can be found at
+            >>> # https://github.com/cocodataset/cocoapi/tree/master/results
+            >>> from torchmetrics.detection import MeanAveragePrecision
+            >>> preds, target = MeanAveragePrecision.coco_to_tm(
+            ...   "instances_val2014_fakebbox100_results.json.json",
+            ...   "val2014_fake_eval_res.txt.json"
+            ...   iou_type="bbox"
+            ... )  # doctest: +SKIP
+        """
+        with contextlib.redirect_stdout(io.StringIO()):
+            gt = COCO(coco_target)
+            dt = gt.loadRes(coco_preds)
+        gt_dataset = gt.dataset["annotations"]
+        dt_dataset = dt.dataset["annotations"]
+        target = {}
+        for t in gt_dataset:
+            if t["image_id"] not in target:
+                target[t["image_id"]] = {
+                    "boxes" if iou_type == "bbox" else "masks": [],
+                    "labels": [],
+                    "iscrowd": [],
+                    "area": [],
+                }
+            if iou_type == "bbox":
+                target[t["image_id"]]["boxes"].append(t["bbox"])
+            else:
+                target[t["image_id"]]["masks"].append(gt.annToMask(t))
+            target[t["image_id"]]["labels"].append(t["category_id"])
+            target[t["image_id"]]["iscrowd"].append(t["iscrowd"])
+            target[t["image_id"]]["area"].append(t["area"])
+        preds = {}
+        for p in dt_dataset:
+            if p["image_id"] not in preds:
+                preds[p["image_id"]] = {
+                    "boxes" if iou_type == "bbox" else "masks": [], "scores": [], "labels": []}
+            if iou_type == "bbox":
+                preds[p["image_id"]]["boxes"].append(p["bbox"])
+            else:
+                preds[p["image_id"]]["masks"].append(gt.annToMask(p))
+            preds[p["image_id"]]["scores"].append(p["score"])
+            preds[p["image_id"]]["labels"].append(p["category_id"])
+        for k in target:  # add empty predictions for images without predictions
+            if k not in preds:
+                preds[k] = {"boxes" if iou_type ==
+                            "bbox" else "masks": [], "scores": [], "labels": []}
+        batched_preds, batched_target = [], []
+        for key in target:
+            name = "boxes" if iou_type == "bbox" else "masks"
+            batched_preds.append(
+                {
+                    name: np.array(
+                        np.array(preds[key]["boxes"]), dtype=np.float32)
+                    if iou_type == "bbox"
+                    else np.array(np.array(preds[key]["masks"]), dtype=np.uint8),
+                    "scores": np.array(preds[key]["scores"], dtype=np.float32),
+                    "labels": np.array(preds[key]["labels"], dtype=np.int32),
+                }
+            )
+            batched_target.append(
+                {
+                    name: np.array(
+                        target[key]["boxes"], dtype=np.float32)
+                    if iou_type == "bbox"
+                    else np.array(np.array(target[key]["masks"]), dtype=np.uint8),
+                    "labels": np.array(target[key]["labels"], dtype=np.int32),
+                    "iscrowd": np.array(target[key]["iscrowd"], dtype=np.int32),
+                    "area": np.array(target[key]["area"], dtype=np.float32),
+                }
+            )
+        return batched_preds, batched_target
+    def np_to_coco(self, name: str = "np_map_input") -> None:
+        """Utility function for converting the input for this metric to coco format and saving it to a json file.
+        This function should be used after calling `.update(...)` or `.forward(...)` on all data that should be written
+        to the file, as the input is then internally cached. The function then converts to information to coco format
+        a writes it to json files.
+        Args:
+            name: Name of the output file, which will be appended with "_preds.json" and "_target.json"
+        Example:
+            >>> import numpy as np
+            >>> from metrics.detection import MeanAveragePrecision
+            >>> preds = [
+            ...   dict(
+            ...     boxes=np.array([[258.0, 41.0, 606.0, 285.0]]),
+            ...     scores=np.array([0.536]),
+            ...     labels=np.array([0]),
+            ...   )
+            ... ]
+            >>> target = [
+            ...   dict(
+            ...     boxes=np.array([[214.0, 41.0, 562.0, 285.0]]),
+            ...     labels=np.array([0]),
+            ...   )
+            ... ]
+            >>> metric = PrecisionRecallF1Support()
+            >>> metric.update(preds, target)
+            >>> metric.np_to_coco("np_map_input")  # doctest: +SKIP
+        """
+        target_dataset = self._get_coco_format(
+            self.groundtruths, self.groundtruth_labels)
+        preds_dataset = self._get_coco_format(
+            self.detections, self.detection_labels, self.detection_scores)
+        preds_json = json.dumps(preds_dataset["annotations"], indent=4)
+        target_json = json.dumps(target_dataset, indent=4)
+        with open(f"{name}_preds.json", "w") as f:
+            f.write(preds_json)
+        with open(f"{name}_target.json", "w") as f:
+            f.write(target_json)
+    def _get_safe_item_values(self, item: Dict[str, Any]) -> Union[np.ndarray, Tuple]:
+        """Convert and return the boxes or masks from the item depending on the iou_type.
+        Args:
+            item: input dictionary containing the boxes or masks
+        Returns:
+            boxes or masks depending on the iou_type
+        """
+        if self.iou_type == "bbox":
+            boxes = _fix_empty_arrays(item["boxes"])
+            if boxes.size > 0:
+                boxes = box_convert(
+                    boxes, in_fmt=self.box_format, out_fmt="xywh")
+            return boxes
+        if self.iou_type == "segm":
+            masks = []
+            for i in item["masks"]:
+                rle = mask_utils.encode(np.asfortranarray(i))
+                masks.append((tuple(rle["size"]), rle["counts"]))
+            return tuple(masks)
+        raise Exception(f"IOU type {self.iou_type} is not supported")
+    def _get_classes(self) -> List:
+        """Return a list of unique classes found in ground truth and detection data."""
+        all_labels = np.concatenate(
+            self.detection_labels + self.groundtruth_labels)
+        unique_classes = np.unique(all_labels)
+        return unique_classes.tolist()
+    def _get_coco_format(
+        self,
+        boxes: List[np.ndarray],
+        labels: List[np.ndarray],
+        scores: Optional[List[np.ndarray]] = None,
+        crowds: Optional[List[np.ndarray]] = None,
+        area: Optional[List[np.ndarray]] = None,
+    ) -> Dict:
+        """Transforms and returns all cached targets or predictions in COCO format.
+        Format is defined at https://cocodataset.org/#format-data
+        """
+        images = []
+        annotations = []
+        annotation_id = 1  # has to start with 1, otherwise COCOEval results are wrong
+        for image_id, (image_boxes, image_labels) in enumerate(zip(boxes, labels)):
+            if self.iou_type == "segm" and len(image_boxes) == 0:
+                continue
+            if self.iou_type == "bbox":
+                image_boxes = image_boxes.tolist()
+            image_labels = image_labels.tolist()
+            images.append({"id": image_id})
+            if self.iou_type == "segm":
+                images[-1]["height"], images[-1]["width"] = image_boxes[0][0][0], image_boxes[0][0][1]
+            for k, (image_box, image_label) in enumerate(zip(image_boxes, image_labels)):
+                if self.iou_type == "bbox" and len(image_box) != 4:
+                    raise ValueError(
+                        f"Invalid input box of sample {image_id}, element {k} (expected 4 values, got {len(image_box)})"
+                    )
+                if not isinstance(image_label, int):
+                    raise ValueError(
+                        f"Invalid input class of sample {image_id}, element {k}"
+                        f" (expected value of type integer, got type {type(image_label)})"
+                    )
+                stat = image_box if self.iou_type == "bbox" else {
+                    "size": image_box[0], "counts": image_box[1]}
+                if area is not None and area[image_id][k].tolist() > 0:
+                    area_stat = area[image_id][k].tolist()
+                else:
+                    area_stat = image_box[2] * \
+                        image_box[3] if self.iou_type == "bbox" else mask_utils.area(
+                            stat)
+                annotation = {
+                    "id": annotation_id,
+                    "image_id": image_id,
+                    "bbox" if self.iou_type == "bbox" else "segmentation": stat,
+                    "area": area_stat,
+                    "category_id": image_label,
+                    "iscrowd": crowds[image_id][k].tolist() if crowds is not None else 0,
+                }
+                if scores is not None:
+                    score = scores[image_id][k].tolist()
+                    if not isinstance(score, float):
+                        raise ValueError(
+                            f"Invalid input score of sample {image_id}, element {k}"
+                            f" (expected value of type float, got type {type(score)})"
+                        )
+                    annotation["score"] = score
+                annotations.append(annotation)
+                annotation_id += 1
+        classes = [{"id": i, "name": str(i)} for i in self._get_classes()]
+        return {"images": images, "annotations": annotations, "categories": classes}

modified_coco/utils.py ADDED Viewed

	@@ -0,0 +1,220 @@

+import numpy as np
+def box_denormalize(boxes: np.ndarray, img_w: int, img_h: int) -> np.ndarray:
+    """
+    Denormalizes boxes from [0, 1] to [0, img_w] and [0, img_h].
+    Args:
+        boxes (Tensor[N, 4]): boxes which will be denormalized.
+        img_w (int): Width of image.
+        img_h (int): Height of image.
+    Returns:
+        Tensor[N, 4]: Denormalized boxes.
+    """
+    if boxes.size == 0:
+        return boxes
+    # check if boxes are normalized
+    if np.any(boxes > 1.0):
+        return boxes
+    boxes[:, 0::2] *= img_w
+    boxes[:, 1::2] *= img_h
+    return boxes
+def box_convert(boxes: np.ndarray, in_fmt: str, out_fmt: str) -> np.ndarray:
+    """
+    Converts boxes from given in_fmt to out_fmt.
+    Supported in_fmt and out_fmt are:
+    'xyxy': boxes are represented via corners, x1, y1 being top left and x2, y2 being bottom right.
+    This is the format that torchvision utilities expect.
+    'xywh' : boxes are represented via corner, width and height, x1, y2 being top left, w, h being width and height.
+    'cxcywh' : boxes are represented via centre, width and height, cx, cy being center of box, w, h
+    being width and height.
+    Args:
+        boxes (Tensor[N, 4]): boxes which will be converted.
+        in_fmt (str): Input format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh'].
+        out_fmt (str): Output format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh']
+    Returns:
+        Tensor[N, 4]: Boxes into converted format.
+    """
+    if boxes.size == 0:
+        return boxes
+    allowed_fmts = ("xyxy", "xywh", "cxcywh")
+    if in_fmt not in allowed_fmts or out_fmt not in allowed_fmts:
+        raise ValueError(
+            "Unsupported Bounding Box Conversions for given in_fmt and out_fmt")
+    if in_fmt == out_fmt:
+        return boxes.copy()
+    if in_fmt != "xyxy" and out_fmt != "xyxy":
+        # convert to xyxy and change in_fmt xyxy
+        if in_fmt == "xywh":
+            boxes = _box_xywh_to_xyxy(boxes)
+        elif in_fmt == "cxcywh":
+            boxes = _box_cxcywh_to_xyxy(boxes)
+        in_fmt = "xyxy"
+    if in_fmt == "xyxy":
+        if out_fmt == "xywh":
+            boxes = _box_xyxy_to_xywh(boxes)
+        elif out_fmt == "cxcywh":
+            boxes = _box_xyxy_to_cxcywh(boxes)
+    elif out_fmt == "xyxy":
+        if in_fmt == "xywh":
+            boxes = _box_xywh_to_xyxy(boxes)
+        elif in_fmt == "cxcywh":
+            boxes = _box_cxcywh_to_xyxy(boxes)
+    return boxes
+def _box_xywh_to_xyxy(boxes):
+    """
+    Converts bounding boxes from (x, y, w, h) format to (x1, y1, x2, y2) format.
+    (x, y) refers to top left of bounding box.
+    (w, h) refers to width and height of box.
+    Args:
+        boxes (ndarray[N, 4]): boxes in (x, y, w, h) which will be converted.
+    Returns:
+        boxes (ndarray[N, 4]): boxes in (x1, y1, x2, y2) format.
+    """
+    x, y, w, h = np.split(boxes, 4, axis=-1)
+    x1 = x
+    y1 = y
+    x2 = x + w
+    y2 = y + h
+    converted_boxes = np.concatenate([x1, y1, x2, y2], axis=-1)
+    return converted_boxes
+def _box_cxcywh_to_xyxy(boxes):
+    """
+    Converts bounding boxes from (cx, cy, w, h) format to (x1, y1, x2, y2) format.
+    (cx, cy) refers to center of bounding box
+    (w, h) are width and height of bounding box
+    Args:
+        boxes (ndarray[N, 4]): boxes in (cx, cy, w, h) format which will be converted.
+    Returns:
+        boxes (ndarray[N, 4]): boxes in (x1, y1, x2, y2) format.
+    """
+    cx, cy, w, h = np.split(boxes, 4, axis=-1)
+    x1 = cx - 0.5 * w
+    y1 = cy - 0.5 * h
+    x2 = cx + 0.5 * w
+    y2 = cy + 0.5 * h
+    converted_boxes = np.concatenate([x1, y1, x2, y2], axis=-1)
+    return converted_boxes
+def _box_xyxy_to_xywh(boxes):
+    """
+    Converts bounding boxes from (x1, y1, x2, y2) format to (x, y, w, h) format.
+    (x1, y1) refer to top left of bounding box
+    (x2, y2) refer to bottom right of bounding box
+    Args:
+        boxes (ndarray[N, 4]): boxes in (x1, y1, x2, y2) which will be converted.
+    Returns:
+        boxes (ndarray[N, 4]): boxes in (x, y, w, h) format.
+    """
+    x1, y1, x2, y2 = np.split(boxes, 4, axis=-1)
+    w = x2 - x1
+    h = y2 - y1
+    converted_boxes = np.concatenate([x1, y1, w, h], axis=-1)
+    return converted_boxes
+def _box_xyxy_to_cxcywh(boxes):
+    """
+    Converts bounding boxes from (x1, y1, x2, y2) format to (cx, cy, w, h) format.
+    (x1, y1) refer to top left of bounding box
+    (x2, y2) refer to bottom right of bounding box
+    Args:
+        boxes (ndarray[N, 4]): boxes in (x1, y1, x2, y2) format which will be converted.
+    Returns:
+        boxes (ndarray[N, 4]): boxes in (cx, cy, w, h) format.
+    """
+    x1, y1, x2, y2 = np.split(boxes, 4, axis=-1)
+    cx = (x1 + x2) / 2
+    cy = (y1 + y2) / 2
+    w = x2 - x1
+    h = y2 - y1
+    converted_boxes = np.concatenate([cx, cy, w, h], axis=-1)
+    return converted_boxes
+def _fix_empty_arrays(boxes: np.ndarray) -> np.ndarray:
+    """Empty tensors can cause problems, this methods corrects them."""
+    if boxes.size == 0 and boxes.ndim == 1:
+        return np.expand_dims(boxes, axis=0)
+    return boxes
+def _input_validator(preds, targets, iou_type="bbox"):
+    """Ensure the correct input format of `preds` and `targets`."""
+    if iou_type == "bbox":
+        item_val_name = "boxes"
+    elif iou_type == "segm":
+        item_val_name = "masks"
+    else:
+        raise Exception(f"IOU type {iou_type} is not supported")
+    if not isinstance(preds, (list, tuple)):
+        raise ValueError(
+            f"Expected argument `preds` to be of type list or tuple, but got {type(preds)}")
+    if not isinstance(targets, (list, tuple)):
+        raise ValueError(
+            f"Expected argument `targets` to be of type list or tuple, but got {type(targets)}")
+    if len(preds) != len(targets):
+        raise ValueError(
+            f"Expected argument `preds` and `targets` to have the same length, but got {len(preds)} and {len(targets)}"
+        )
+    for k in [item_val_name, "scores", "labels"]:
+        if any(k not in p for p in preds):
+            raise ValueError(
+                f"Expected all dicts in `preds` to contain the `{k}` key")
+    for k in [item_val_name, "labels"]:
+        if any(k not in p for p in targets):
+            raise ValueError(
+                f"Expected all dicts in `targets` to contain the `{k}` key")
+    if any(type(pred[item_val_name]) is not np.ndarray for pred in preds):
+        raise ValueError(
+            f"Expected all {item_val_name} in `preds` to be of type ndarray")
+    if any(type(pred["scores"]) is not np.ndarray for pred in preds):
+        raise ValueError(
+            "Expected all scores in `preds` to be of type ndarray")
+    if any(type(pred["labels"]) is not np.ndarray for pred in preds):
+        raise ValueError(
+            "Expected all labels in `preds` to be of type ndarray")
+    if any(type(target[item_val_name]) is not np.ndarray for target in targets):
+        raise ValueError(
+            f"Expected all {item_val_name} in `targets` to be of type ndarray")
+    if any(type(target["labels"]) is not np.ndarray for target in targets):
+        raise ValueError(
+            "Expected all labels in `targets` to be of type ndarray")
+    for i, item in enumerate(targets):
+        if item[item_val_name].shape[0] != item["labels"].shape[0]:
+            raise ValueError(
+                f"Input {item_val_name} and labels of sample {i} in targets have a"
+                f" different length (expected {item[item_val_name].shape[0]} labels, got {item['labels'].shape[0]})"
+            )
+    for i, item in enumerate(preds):
+        if not (item[item_val_name].shape[0] == item["labels"].shape[0] == item["scores"].shape[0]):
+            raise ValueError(
+                f"Input {item_val_name}, labels and scores of sample {i} in predictions have a"
+                f" different length (expected {item[item_val_name].shape[0]} labels and scores,"
+                f" got {item['labels'].shape[0]} labels and {item['scores'].shape[0]})"
+            )

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+git+https://github.com/huggingface/evaluate@main
+numpy==1.24.3
+pycocotools==2.0.6