Spaces:

xiang-wuu
/

yolov5

Runtime error

App Files Files Community

Yann Defretin

glenn-jocher commited on Mar 15, 2021

Commit

38ff499

unverified ·

1 Parent(s): 6f718ce

Update autosplit() with annotated_only option (#2466)

Browse files

* Be able to create dataset from annotated images only

Add the ability to create a dataset/splits only with images that have an annotation file, i.e a .txt file, associated to it. As we talked about this, the absence of a txt file could mean two things:

* either the image wasn't yet labelled by someone,
* either there is no object to detect.

When it's easy to create small datasets, when you have to create datasets with thousands of images (and more coming), it's hard to track where you at and you don't want to wait to have all of them annotated before starting to train. Which means some images would lack txt files and annotations, resulting in label inconsistency as you say in #2313. By adding the annotated_only argument to the function, people could create, if they want to, datasets/splits only with images that were labelled, for sure.

* Cleanup and update print()

Co-authored-by: Glenn Jocher <[email protected]>

Files changed (1) hide show

utils/datasets.py +11 -7

utils/datasets.py CHANGED Viewed

@@ -1032,20 +1032,24 @@ def extract_boxes(path='../coco128/'):  # from utils.datasets import *; extract_
                     b[[1, 3]] = np.clip(b[[1, 3]], 0, h)
                     assert cv2.imwrite(str(f), im[b[1]:b[3], b[0]:b[2]]), f'box failure in {f}'
-def autosplit(path='../coco128', weights=(0.9, 0.1, 0.0)):  # from utils.datasets import *; autosplit('../coco128')
     """ Autosplit a dataset into train/val/test splits and save path/autosplit_*.txt files
-    # Arguments
-        path:       Path to images directory
-        weights:    Train, val, test weights (list)
     """
     path = Path(path)  # images dir
-    files = list(path.rglob('*.*'))
     n = len(files)  # number of files
     indices = random.choices([0, 1, 2], weights=weights, k=n)  # assign each image to a split
     txt = ['autosplit_train.txt', 'autosplit_val.txt', 'autosplit_test.txt']  # 3 txt files
     [(path / x).unlink() for x in txt if (path / x).exists()]  # remove existing
     for i, img in tqdm(zip(indices, files), total=n):
-        if img.suffix[1:] in img_formats:
             with open(path / txt[i], 'a') as f:
                 f.write(str(img) + '\n')  # add image to txt file

                     b[[1, 3]] = np.clip(b[[1, 3]], 0, h)
                     assert cv2.imwrite(str(f), im[b[1]:b[3], b[0]:b[2]]), f'box failure in {f}'
+def autosplit(path='../coco128', weights=(0.9, 0.1, 0.0), annotated_only=False):
     """ Autosplit a dataset into train/val/test splits and save path/autosplit_*.txt files
+    Usage: from utils.datasets import *; autosplit('../coco128')
+    Arguments
+        path:           Path to images directory
+        weights:        Train, val, test weights (list)
+        annotated_only: Only use images with an annotated txt file
     """
     path = Path(path)  # images dir
+    files = sum([list(path.rglob(f"*.{img_ext}")) for img_ext in img_formats], [])  # image files only
     n = len(files)  # number of files
     indices = random.choices([0, 1, 2], weights=weights, k=n)  # assign each image to a split
     txt = ['autosplit_train.txt', 'autosplit_val.txt', 'autosplit_test.txt']  # 3 txt files
     [(path / x).unlink() for x in txt if (path / x).exists()]  # remove existing
+    print(f'Autosplitting images from {path}' + ', using *.txt labeled images only' * annotated_only)
     for i, img in tqdm(zip(indices, files), total=n):
+        if not annotated_only or Path(img2label_paths([str(img)])[0]).exists():  # check label
             with open(path / txt[i], 'a') as f:
                 f.write(str(img) + '\n')  # add image to txt file