pylabel-project / pylabel Goto Github PK

Python library for computer vision labeling tasks. The core functionality is to translate bounding box annotations between different formats-for example, from coco to yolo.

License: MIT License

Python 99.85% Shell 0.15%

computer-vision annotation-tool yolov5 coco bounding-boxes dataset object-detection

pylabel's Introduction

PyLabel

PyLabel is a Python package to help you prepare image datasets for computer vision models including PyTorch and YOLOv5. It can translate bounding box annotations between different formats. (For example, COCO to YOLO.) And it includes an AI-assisted labeling tool that runs in a Jupyter notebook.

Translate: Convert annotation formats with a single line of code:

importer.ImportCoco(path_to_annotations).export.ExportToYoloV5()

Analyze: PyLabel stores annotatations in a pandas dataframe so you can easily perform analysis on image datasets.
Split: Divide image datasets into train, test, and val with stratification to get consistent class distribution.
Label: PyLabel also includes an image labeling tool that runs in a Jupyter notebook that can annotate images manually or perform automatic labeling using a pre-trained model.
Visualize: Render images from your dataset with bounding boxes overlaid so you can confirm the accuracy of the annotations.

Tutorial Notebooks

See PyLabel in action in these sample Jupyter notebooks:

Find more docs at https://pylabel.readthedocs.io.

About PyLabel

PyLabel was developed by Jeremy Fraenkel, Alex Heaton, and Derek Topper as the Capstope project for the Master of Information and Data Science (MIDS) at the UC Berkeley School of Information. If you have any questions or feedback please create an issue. Please let us know how we can make PyLabel more useful.

pylabel's People

Contributors

Stargazers

Watchers

Forkers

erikvalle landegt zubair2019 o7s8r6 ajayarunachalam esgnn ytzeng1 wx-b rajasvu danacity alexheat fahrizalfarid luigibkl abdelbarre nooj mabboux chenhjcs dnth thovden pjbull jungyitsai emillundh louderthanthunderx1 linaom1214 temi0506 xiaoanshi sirbastiano noklm ptkis geezacoleman volodink danieleceul r8420 achbogga rainbow-ludwig charitarthchugh yrik pizixie chrisrapson yaseralosh quantqtech gyin2010 jkaisinger romas-458 bestsongc dpservis woid1221 5l1v3r1 r-n ahmadmughees gh-wf meiting0430 thelim3y zxw2600

pylabel's Issues

reindexing category id when exporting to coco json format

Instead of changing the category Ids, it might be a nice feature to add a "background" category at category_id = 0, that is how other libraries do it.

Error when i transform a COCO dataset to a YOLO dataset with segmentation = true and cat_id_index = 0

I got an error when i use segmentation = TRUE and cat_id_index = 0.
The unmerged PR doesn't resolve the problem.
Im using colab as environment

Im linked the dataset example:
https://we.tl/t-ODWNHXMomH

You will find the .json file in COCO format and images

how to edit yolo label with imported txt labels ?

I have get the annotated yolo images and txt files.
And I haved loaded data in notebook. How do I only edit bboxes without the need of pretrained models?

ValueError: cannot convert float NaN to integer

Occurs on calling visualize when bounding boxes have nan values:

from IPython.display import Image, display
display(ds.visualize.ShowBoundingBoxes(5))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/robin/Documents/GitHub/oil-storage-tank/pylabel.ipynb Cell 7' in <cell line: 2>()
      [1](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pylabel.ipynb#ch0000002?line=0)[ from IPython.display import Image, display
----> ]()[2](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pylabel.ipynb#ch0000002?line=1)[ display(ds.visualize.ShowBoundingBoxes(5))

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py:34, in Visualize.ShowBoundingBoxes(self, img_id, img_filename)
     ]()[32](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=31)[ for index, row in df_single_img_annots.iterrows():
     ]()[33](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=32)[     labels.append(row['cat_name'])
---> ]()[34](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=33)[     bboxes.append([int(row['ann_bbox_xmin']),int(row['ann_bbox_ymin']),int(row['ann_bbox_xmax']),int(row['ann_bbox_ymax'])])
     ]()[36](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=35)[ img_with_boxes = bbv.draw_multiple_rectangles(img, bboxes)
     ]()[37](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=36)[ img_with_boxes = bbv.add_multiple_labels(img_with_boxes, labels, bboxes)

ValueError: cannot convert float NaN to integer]()

I guess this occurs as I have images with zero annotations

ImportCoco Fails if path_to_images is not specified

This line will fail because path_to_images is not specified even though it is an optional parameter

dataset = importer.ImportCoco(path_to_annotations, name="BCCD_coco")

Error message

---> 43     images["img_folder"] = _GetValueOrBlank(images["img_folder"], path_to_images)
     44     astype_dict = {'img_width': 'int64','img_height': 'int64','img_depth': 'int64'}
     45     astype_keys = list(astype_dict.keys())

/usr/local/lib/python3.7/dist-packages/pylabel/importer.py in _GetValueOrBlank(element, user_input)
     19     """
     20     if user_input == None:
---> 21         return element.text
     22     else:
     23         return user_input

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   5139             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5140                 return self[name]
-> 5141             return object.__getattribute__(self, name)
   5142 
   5143     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'text'

The fix will be to remove the use of the _GetValueOrBlank function in importer.py

Issue if image has no bounding boxes

Hi,
I was converting from YOLO to VOC, and it seems that if an image doesnt have any bounding boxes, then it causes an error to occur. This is the error: ValueError: invalid literal for int() with base 10: '. This arises in the exporter.py script.

I am getting an error when using dataset.analyze.classes and the ExportToYoloV5

Discussed in https://github.com/pylabel-project/pylabel/discussions/22

^{Originally posted by edehino February 10, 2022}
Here's the error:

  File "C:\Users\anaconda3\lib\site-packages\pylabel\analyze.py", line 26, in classes
    categories = dict(zip(filtered_df.cat_name, filtered_df.cat_id.astype("int")))
  File "C:\Users\anaconda3\lib\site-packages\pandas\core\generic.py", line 5815, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "C:\Users\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 418, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "C:\Users\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "C:\Users\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 591, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "C:\Users\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1309, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "C:\Users\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1257, in astype_array
    values = astype_nansafe(values, dtype, copy=copy)
  File "C:\Users\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1174, in astype_nansafe
    return lib.astype_intsafe(arr, dtype)
  File "pandas\_libs\lib.pyx", line 679, in pandas._libs.lib.astype_intsafe
ValueError: invalid literal for int() with base 10: '3.0'

I checked if I have pandas package installed and I do have. Can you help me with this?

ValueError: not enough values to unpack (expected 5, got 0).

I think this happens with images that don't have annotations, meaning their respective .txt files are empty.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_24072/4001401144.py in <module>
      9 print("Here")
     10 dataset1 = importer.ImportYoloV5(path=path_to_annotations, path_to_images=path_to_images, cat_names=yoloclasses,
---> 11     img_ext="jpeg", name="view, cut, plan")
     12 print("Here 2")
     13 

~\Anaconda3\envs\yolov5\lib\site-packages\pylabel\importer.py in ImportYoloV5(path, img_ext, cat_names, path_to_images, name)
    300                     width_norm,
    301                     height_norm,
--> 302                 ) = line.split()
    303                 row["img_folder"] = path_to_images
    304                 row["img_filename"] = filename.name.replace("txt", img_ext)

ValueError: not enough values to unpack (expected 5, got 0)

IndexError: index 8 is out of bounds for axis 0 with size 7

Code:

#Specify path to the coco.json file
path_to_annotations = "data/coco.json"
#Specify the path to the images (if they are in a different folder than the annotations)
path_to_images = "data/image_patches/"

#Import the dataset into the pylable schema 
dataset = importer.ImportCoco(path_to_annotations, path_to_images=path_to_images, name="tanks_coco")
dataset.df.head(5)

Error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb Cell 7' in <cell line: 7>()
      [4](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=3)[ path_to_images = "data/image_patches/"
      ]()[6](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=5)[ #Import the dataset into the pylable schema 
----> ]()[7](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=6)[ dataset = importer.ImportCoco(path_to_annotations, path_to_images=path_to_images, name="tanks_coco")
      ]()[8](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pre-processing.ipynb#ch0000010?line=7)[ dataset.df.head(5)

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py:94, in ImportCoco(path, path_to_images, name)
     ]()[89](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=88)[ df.ann_category_id = df.ann_category_id.astype(str)
     ]()[91](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=90)[ df[
     ]()[92](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=91)[     ["ann_bbox_xmin", "ann_bbox_ymin", "ann_bbox_width", "ann_bbox_height"]
     ]()[93](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=92)[ ] = pd.DataFrame(df.ann_bbox.tolist(), index=df.index)
---> ]()[94](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=93)[ df.insert(8, "ann_bbox_xmax", df["ann_bbox_xmin"] + df["ann_bbox_width"])
     ]()[95](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=94)[ df.insert(10, "ann_bbox_ymax", df["ann_bbox_ymin"] + df["ann_bbox_height"])
     ]()[97](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=96)[ # debug print(df.info())
     ]()[98](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=97)[ 
     ]()[99](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/importer.py?line=98)[ # Join the annotions with the information about the image to add the image columns to the dataframe

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py:4439, in DataFrame.insert(self, loc, column, value, allow_duplicates)
   ]()[4436](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py?line=4435)[     raise TypeError("loc must be int")
   ]()[4438](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py?line=4437)[ value = self._sanitize_column(value)
-> ]()[4439](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/frame.py?line=4438)[ self._mgr.insert(loc, column, value)]()
File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py:1230, in BlockManager.insert(self, loc, item, value)
   [1220](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1219)[ """
   ]()[1221](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1220)[ Insert item at selected position.
   ]()[1222](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1221)[ 
   (...)
   ]()[1227](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1226)[ value : np.ndarray or ExtensionArray
   ]()[1228](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1227)[ """
   ]()[1229](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1228)[ # insert to the axis; this could possibly raise a TypeError
-> ]()[1230](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1229)[ new_axis = self.items.insert(loc, item)
   ]()[1232](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1231)[ if value.ndim == 2:
   ]()[1233](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py?line=1232)[     value = value.T

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:6602, in Index.insert(self, loc, item)
   ]()[6595](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6594)[ if arr.dtype != object or not isinstance(
   ]()[6596](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6595)[     item, (tuple, np.datetime64, np.timedelta64)
   ]()[6597](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6596)[ ):
   ]()[6598](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6597)[     # with object-dtype we need to worry about numpy incorrectly casting
   ]()[6599](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6598)[     # dt64/td64 to integer, also about treating tuples as sequences
   ]()[6600](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6599)[     # special-casing dt64/td64 https://github.com/numpy/numpy/issues/12550
   ]()[6601](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6600)[     casted = arr.dtype.type(item)
-> ]()[6602](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6601)[     new_values = np.insert(arr, loc, casted)
   ]()[6604](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6603)[ else:
   ]()[6605](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6604)[     # No overload variant of "insert" matches argument types
   ]()[6606](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6605)[     # "ndarray[Any, Any]", "int", "None"  [call-overload]
   ]()[6607](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py?line=6606)[     new_values = np.insert(arr, loc, None)  # type: ignore[call-overload]

File <__array_function__ internals>:180, in insert(*args, **kwargs)

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py:5280, in insert(arr, obj, values, axis)
   ]()[5278](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5277)[ index = indices.item()
   ]()[5279](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5278)[ if index < -N or index > N:
-> ]()[5280](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5279)[     raise IndexError(f"index {obj} is out of bounds for axis {axis} "
   ]()[5281](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5280)[                      f"with size {N}")
   ]()[5282](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5281)[ if (index < 0):
   ]()[5283](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/numpy/lib/function_base.py?line=5282)[     index += N

IndexError: index 8 is out of bounds for axis 0 with size 7]()

Error during exporting of split dataset (with null class values).

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    142     try:
--> 143         result = expressions.evaluate(op, left, right)
    144     except TypeError:

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\computation\expressions.py in evaluate(op, a, b, use_numexpr)
    232         if use_numexpr:
--> 233             return _evaluate(op, op_str, a, b)  # type: ignore
    234     return _evaluate_standard(op, op_str, a, b)

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_standard(op, op_str, a, b)
     67     with np.errstate(all="ignore"):
---> 68         return op(a, b)
     69 

TypeError: can't multiply sequence by non-int of type 'float'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_59916/1620987889.py in <module>
----> 1 dataset.export.ExportToYoloV5(output_path='model_training/labels',yaml_file='dataset.yaml', use_splits=True)

~\Anaconda3\envs\yolov5\lib\site-packages\pylabel\exporter.py in ExportToYoloV5(self, output_path, yaml_file, copy_images, use_splits, cat_id_index)
    486 
    487         yolo_dataset["center_x_scaled"] = (
--> 488             yolo_dataset["ann_bbox_xmin"] + (yolo_dataset["ann_bbox_width"] * 0.5)
    489         ) / yolo_dataset["img_width"]
    490         yolo_dataset["center_y_scaled"] = (

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right)
    341         lvalues = extract_array(left, extract_numpy=True)
    342         rvalues = extract_array(right, extract_numpy=True)
--> 343         result = arithmetic_op(lvalues, rvalues, op)
    344 
    345         return left._construct_result(result, name=res_name)

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
    188     else:
    189         with np.errstate(all="ignore"):
--> 190             res_values = na_arithmetic_op(lvalues, rvalues, op)
    191 
    192     return res_values

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    148             #  will handle complex numbers incorrectly, see GH#32047
    149             raise
--> 150         result = masked_arith_op(left, right, op)
    151 
    152     if is_cmp and (is_scalar(result) or result is NotImplemented):

~\Anaconda3\envs\yolov5\lib\site-packages\pandas\core\ops\array_ops.py in masked_arith_op(x, y, op)
    110         if mask.any():
    111             with np.errstate(all="ignore"):
--> 112                 result[mask] = op(xrav[mask], y)
    113 
    114     result, _ = maybe_upcast_putmask(result, ~mask, np.nan)

TypeError: can't multiply sequence by non-int of type 'float'

Verbosity or progress bar

I'm using StratifiedGroupShuffleSplit on a YOLO dataset of 5k+ images and 130+ classes with about 25 objects in each image. It's been a while and it's still not done yet. I don't know if it's just slow or it's stuck. I wish there were some verbosity or progress bar so I know it's working

dataset.splitter.StratifiedGroupShuffleSplit(train_pct=.8, val_pct=.0, test_pct=.2, batch_size=1)

Crash when converting an image from COCO to YOLO that has no annotation in it

Images with no annotations will still get a row inside the annotation dataframe, but the values in that row will be NaN, which leads to an error on line 589:

I fixed this by adding an if statement around the with open() like so:

This will also make sure that no label text file is created for unannotated images, which seems desirable.

Also is there any way to visualize segmentations instead of just bounding boxes?

Issue using both segmentation=True and cat_id_index=0

Hello, I'm having an issue with the exporting function when using both segmentation=True and cat_id_index=0. It works fine when I remove the cat_id_index=0 but when both are added I get an error.

dataset.export.ExportToYoloV5(output_path=yolo_labels_path, use_splits=True, cat_id_index=0, copy_images=True, segmentation=True)

Error:

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pylabel/exporter.py:596, in Export.ExportToYoloV5(self, output_path, yaml_file, copy_images, use_splits, cat_id_index, segmentation)
    593 for index, l in enumerate(segmentation_array):
    594     # The first number in the array is the x value so divide by the width
    595     if index % 2 == 0:
--> 596         row += " " + (
    597             str(
    598                 segmentation_array[index]
    599                 / df_single_img_annots.iloc[i].img_width
    600             )
    601         )
    602     else:
    603         # The first number in the array is the x value so divide by the height
    604         row += " " + (
    605             str(
    606                 segmentation_array[index]
    607                 / df_single_img_annots.iloc[i].img_height
    608             )
    609         )

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('int64'), dtype('<U19')) -> None

Category_id exported as string instead of int

Hi,

Thank you for create this nice utility!

I encountered an issue with ExportToCoco, the category_id in the exported json file is stored as a string instead of an integer, which caused trouble when registering datasets in detectron2. My quick fix was just cast all category_id to int.

Yolo export fails if image files have multiple extensions

The Yolo format infers the text annotation file from the image file name. Pylabel creates the file name for the annotation file in the following way:

annot_txt_file = img_filename.split(".")[0] + ".txt"

This will cause a mismatch between the filenames of the images and annotation files, causing Yolo to silently ignore the annotations (and a few hours of debugging on my side).

The fix is simple, and I will submit a PR on this shortly.

add in tqdm for progress tracking

Thanks for developing a very handy and easy to use tool.

I was recently trying to convert a dataset of ~5500 images from YOLO to VOC and it took about 15 minutes on my computer. It seemed to work fine in the end, but having some form of progress indication (i.e. tqdm) would be helpful to know it is working correctly.

yolo class label is unsorted

I have a VOC dataset.
I found class label is unsorted in YAML file,When I converted VOC to YOLO.
VOC labels

YOLO labels

Can someone tell me why?

(Question) Is there a way to add predicted annotations to compare to ground truth?

Say I load my test set in COCO format with images and annotations.
Now I also would like to add predictions from several of my models. Those predictions are also in COCO format for model1 and model2.

Is there a way of doing this and visualizing the ground truth as well as predictions of two separate models?

_ReindexCatIds is broken

https://github.com/thovden/pylabel/blob/c17d2c644f1dcb85ef3f9835c8771b96b969a673/pylabel/shared.py#L39

The __ReindexCatIds function is broken, mainly because it's operating on a copy of a DataFrame.

The following code does not work in-place by default:

    df = df.replace(r"^\s*$", np.nan, regex=True)

Hence, the rest of the function operates on a copy of the DataFrame, but the intention is to do in-place replacements of df['cat_id'].

I also suspect the intention of the following code is probably to coerce df['cat_id'] to an numeric type:

pd.to_numeric(df["cat_id"])

However, these values are not written back to the original data frame.

I'm going to cleanup this function a little and submit a PR.

"No objects to concatenate" conversion yolo to coco

I'm doing a yolo 2 coco conversion with yaml, everything works just fine until I do dataset.export.ExportToCoco(cat_id_index=0)
i tried with cat_id_index=1 as well.
The full error is as it follows:
Traceback (most recent call last): File "/home/usuaris/imatge/ilias.khayat/TFG/yolo22coco.py", line 45, in <module> dataset.export.ExportToCoco(cat_id_index=1) File "/home/usuaris/imatge/ilias.khayat/TFG/pylabel/pylabel/exporter.py", line 744, in ExportToCoco mergedI = pd.concat(df_outputI, ignore_index=True) File "/home/usuaris/imatge/ilias.khayat/venv/detectron2-/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/home/usuaris/imatge/ilias.khayat/venv/detectron2-/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 347, in concat op = _Concatenator( File "/home/usuaris/imatge/ilias.khayat/venv/detectron2-/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 404, in __init__ raise ValueError("No objects to concatenate") ValueError: No objects to concatenate srun: error: gpic10: task 0: Exited with exit code 1
I'd appreciate any suggestions.
`

!_src.empty() in function 'cvtColor'

from IPython.display import Image, display
display(dataset.visualize.ShowBoundingBoxes(100))

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
/Users/robin/Documents/GitHub/oil-storage-tank/pylabel.ipynb Cell 3' in <cell line: 2>()
      [1](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pylabel.ipynb#ch0000002?line=0)[ from IPython.display import Image, display
----> ]()[2](vscode-notebook-cell:/Users/robin/Documents/GitHub/oil-storage-tank/pylabel.ipynb#ch0000002?line=1)[ display(dataset.visualize.ShowBoundingBoxes(100))

File ~/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py:27, in Visualize.ShowBoundingBoxes(self, img_id, img_filename)
     ]()[25](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=24)[ full_image_path = str(Path(ds.path_to_annotations, df_single_img_annots.iloc[0].img_folder, df_single_img_annots.iloc[0].img_filename))
     ]()[26](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=25)[ img = cv2.imread(str(full_image_path))
---> ]()[27](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=26)[ img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
     ]()[29](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=28)[ labels = []
     ]()[30](file:///Users/robin/Documents/GitHub/oil-storage-tank/venv/lib/python3.9/site-packages/pylabel/visualize.py?line=29)[ bboxes = []

error: OpenCV(4.5.5) /Users/runner/work/opencv-python/opencv-python/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor']()

I was suspicious the img_path col is empty, but updating this doesn't resolve

ExportToCoco() saves category_id as string instead of integer

I exported my VOC labels into COCO and find that the id and category_id values in the .json file are inconsistent.

In the .jsons file we have "id": 0

"categories": [
        {
            "id": 0,
            "name": "LP",
            "supercategory": null
        }
    ]

But in the annotations I find "category_id": "0"

{
            "image_id": 208,
            "id": 112,
            "segmented": "0",
            "bbox": [
                759.0,
                475.0,
                27.0,
                26.0
            ],
            "area": 702.0,
            "segmentation": null,
            "iscrowd": 0.0,
            "pose": "Unspecified",
            "truncated": "0",
            "category_id": "0",
            "difficult": "0"
        }

I had to manualy change the "category_id": "0" to "category_id": 0 use the annotations and train a model. Is this a bug?

how to split YOLO datasets to train/val. Not train/val/test

I tried code:

 ds.splitter.GroupShuffleSplit(train_pct=0.7,val_pct=0.2,test_pct=0.0)

get

ValueError: train_size=0.0 should be either positive and smaller than the number of samples 376 or a float in the (0, 1) range

Why the param test_pct cannot be zero???

not converting coco to yolo

Hi,
The code used

from pylabel import importer
path_to_annotations = "my_data/modified.json"
path_to_images = "/my_data/myVOC/JPEGImages/"
dataset = importer.ImportCoco(path=path_to_annotations, path_to_images=path_to_images, name="strawberry")
dataset.export.ExportToYoloV5(dataset)

The error msg says

Traceback (most recent call last):
  File "/home/ash/Ash/repo/object_detection/convert_coco2yolo/coco2yolo.py", line 5, in <module>
    dataset.export.ExportToYoloV5(dataset)
  File "/home/ash/anaconda3/envs/condaEnv3.8/lib/python3.8/site-packages/pylabel/exporter.py", line 434, in ExportToYoloV5
    path = PurePath(output_path)
  File "/home/ash/anaconda3/envs/condaEnv3.8/lib/python3.8/pathlib.py", line 651, in __new__
    return cls._from_parts(args)
  File "/home/ash/anaconda3/envs/condaEnv3.8/lib/python3.8/pathlib.py", line 683, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/home/ash/anaconda3/envs/condaEnv3.8/lib/python3.8/pathlib.py", line 667, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not Dataset

Cvat compatibility and bug fixes

Reported by landegt
#1 (comment)

I tried using this package to convert my coco annotations of a custom data-set, which was annotated in CVAT, to YOLOV5 labels. I ran into some issues and have fixed them and hope they may be of use to others as well.

The issues are also explained in the commit messages.

importer.py: the tag "img_folder" was required to be in coco annotations but this is not a necessary tag, it is often left blank or not included at all, when it is not included a KeyError occured. There was an attempt at this in place with a call _GetValueOrBlank(images["img_folder"], path_to_images) which though still requires there to be an "img_folder" in images even if an alternative path was given by the user.
Fixed by checking if this tag/column exists and creating it empty if it does not yet exist

Same issue as 1. only with img_depth, it is not required in the coco annotation format and is not given when exporting from cvat

exporter.py, ExportToYoloV5():
-default value of yaml_file=None leads to TypeError: expected str, ... fixed by setting default value to the recommended yaml_file='dataset.yaml'
-fix to the issue that not using splits created "UnboundLocalError: local variable 'split_dir' referenced before assignment"
fixed by setting split_dir = '' if no split is wanted

analize.py
bug: i in cat_names was assumed to be str and caused error "AttributeError: 'float' object has no attribute 'strip'"
fix: typecast str(i) before calling strip

keep images without annotation

Hi, thanks for opensourcing this great tool! It's quite helpful. Do you know if there is a walk-around that I can use to keep the image information for images that do not have annotations? For example, if one image does not have any annotation, ideally it will be kept in the "images" field when exported to the coco format.

Class categories not correct after conversion from coco to yolo format.

I have a coco dataset. I converted it to yolo format and visualized the annotations. The class labels are not correct anymore. What shall I do?

Cannot import PASCAL VOC dataset

The dataset is in PASCAL VOC format, as exported by CVAT. When importing, I receive the traceback

Traceback (most recent call last):
  File "path\to\convert_annotations.py", line 5, in <module>
    voc_data = importer.ImportVOC(voc_folder)
  File "path\to\.venv\lib\site-packages\pylabel\importer.py", line 207, in ImportVOC
    row["ann_pose"] = _GetValueOrBlank(o.find("pose"))
  File "path\to\.venv\lib\site-packages\pylabel\importer.py", line 27, in _GetValueOrBlank
    return element.text
AttributeError: 'NoneType' object has no attribute 'text'

Here is an example of an annotation xml exported by CVAT:

<?xml version="1.0"?>
<annotation>
<folder>myfolder</folder>
<filename>myfile.png</filename>
<source>
<database>Unknown</database>
<annotation>Unknown</annotation>
<image>Unknown</image>
</source>
<size>
<width>5328</width>
<height>4608</height>
<depth/>
</size>
<segmented>0</segmented>
<object>
<name>my_object</name>
<truncated>0</truncated>
<occluded>0</occluded>
<difficult>0</difficult>
<bndbox>
<xmin>127.06</xmin>
<ymin>632.19</ymin>
<xmax>161.12</xmax>
<ymax>666.86</ymax>
</bndbox>
<attributes>
<attribute>
<name>rotation</name>
<value>0.0</value>
</attribute>
</attributes>
</annotation>```

StratifiedGroupShuffleSplit results in Empty DataFrame

The code

dataset = importer.ImportYoloV5(label_dir, cat_names=['Man', 'Woman', 'Button', 'Cover', 'Preview'])

print(f"Number of images: {dataset.analyze.num_images}")
print(f"Number of classes: {dataset.analyze.num_classes}")
print(f"Classes:{dataset.analyze.classes}")
print(f"Class counts:\n{dataset.analyze.class_counts}")

dataset.splitter.StratifiedGroupShuffleSplit(train_pct=.7, val_pct=.15, test_pct=.15)
print(dataset.analyze.ShowClassSplits())

Console output

Importing YOLO files...: 100%|█████████████████████████████████████████████████████████████████████████████████████| 994/994 [00:02<00:00, 467.40it/s]
Number of images: 452
Number of classes: 5
Classes:['Man', 'Woman', 'Button', 'Cover', 'Preview']
Class counts:
cat_name
Button     314
Man        199
Woman      165
Cover      117
Preview     81
Name: count, dtype: int64
Empty DataFrame
Columns: [all, train, test, val]
Index: []

Dataset splitting & negative samples annotation type importer does not work correctly

I have followed provided notebook for dataset splitting with one notable change - instead of downloading the dataset I have imported my own using importer.Import... . I have tested it in a couple of ways:
a. Importing COCO and exporting YOLO
b. Importing YOLO, and exporting YOLO
In each case, I have tested both types of splitters.

Stratified split seems to not work without some value for validation split
They never correctly split the data for both of them - in terms of pure % wise division, the smaller the val or (val & test) part where the poorer the split was. For example, 80,10,10 split for Stratified split resulted in 92,4,4 samples per split.
Is that because it prioritizes equal class split (I have two) over the number of samples?

As for the negative samples in COCO format - importer throws an error if I try to load a bunch of images with empty annotations - meaning no objects of interest are in those images.
It works for YOLO .txt empty labels file though

Example notebook error

Running import2.ipynb will result in error:

dataset.df = dataset.test(dataset.df)

AttributeError: 'Dataset' object has no attribute 'test'

Multiple image extension for YOLOv5 dataset

My Yolov5 dataset has got multiple image extensions files such as jpg, jpeg, png, etc. What is the best way to handle these cases?

dataset = importer.ImportYoloV5(path=path_to_annotations, path_to_images=path_to_images, cat_names=yoloclasses,
    img_ext="jpg", name="coco128")

in img_ext we can only specify one extension format.

Export annotations to the format used by the Azure Custom Vision service

On stack overflow someone is asking for help with using their current annotation as inputs to the Azure Custom Vision service https://stackoverflow.com/questions/69496834/is-there-a-way-to-upload-images-with-annotations-labeled-images-to-custom-visi

I have hundreds of labeled images and do not want to redo that work in the custom vision labeling tool. Is there a way to upload labeled images to custom vision? Or to Azure ML or Azure ML Studio? Does any vision services in Azure provide for uploading annotated images? Thanks

It should be pretty simple to add an output function to export annotations to whatever format is used by the Azure Custom Service.

If anyone would be interested is this, leave a comment.

AttributeError: 'DataFrame' object has no attribute 'append'

This is occurring in the splitter functions. pandas's dataframe append has been depreciated, and that is likely the cause of this issue.

    from pylabel.importer import ImportVOC

    dataset = ImportVOC(
        "./data/train/Annotations", path_to_images="./data/train/JPEGImages"
    )
    dataset.splitter.GroupShuffleSplit(train_pct=0.6, test_pct=0.2, val_pct=0.2)

 File "/home/cc/.cache/pypoetry/virtualenvs/dac2023-gpu-fastestdet-4X0QWZQw-py3.8/lib/python3.8/site-packages/pylabel/splitter.py", line 36, in GroupShuffleSplit
    self.dataset.df = df_train.append(df_test)
  File "/home/cc/.cache/pypoetry/virtualenvs/dac2023-gpu-fastestdet-4X0QWZQw-py3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 5989, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'append'

Annotation ids in '/content/test/img/coco128.json' are not unique!

after converting my yolov5 annotation to coco format with your code I am getting this error. please help to remove this error. thanks

is CreateML json format supported?

Apple's CreateML annotations format.

I want extract a specific category in coco stuff dataset

as the title said, in your sample, it seems I could only extract some image randomly, and then generate coressponding annotation file, however, it could not satisfy my needs, is there exist some solutions?

Add tqdm to setup.py

Without this dependency, a user would have to install tqdm themselves

Is instance segmentation support planned?

Is instance segmentation support planned?
YOLO5 now supports segmentation

ExportToCoco() Error

I am getting an error when I try to export to coco format.

---------------------------------------------------------------------------

OverflowError                             Traceback (most recent call last)

<ipython-input-88-447fc677ce31> in <module>()
----> 1 dataset.export.ExportToCoco(cat_id_index=1)

5 frames

/usr/local/lib/python3.7/dist-packages/pylabel/exporter.py in ExportToCoco(self, output_path, cat_id_index)
    692         mergedC = pd.concat(df_outputC, ignore_index=True)
    693 
--> 694         resultI = mergedI[0].to_json(orient="split")
    695         resultA = mergedA[0].to_json(orient="split")
    696         resultC = mergedC[0].to_json(orient="split")

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index, indent)
   2306             compression=compression,
   2307             index=index,
-> 2308             indent=indent,
   2309         )
   2310 

/usr/local/lib/python3.7/dist-packages/pandas/io/json/_json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index, indent)
     82         default_handler=default_handler,
     83         index=index,
---> 84         indent=indent,
     85     ).write()
     86 

/usr/local/lib/python3.7/dist-packages/pandas/io/json/_json.py in write(self)
    142             self.date_format == "iso",
    143             self.default_handler,
--> 144             self.indent,
    145         )
    146 

/usr/local/lib/python3.7/dist-packages/pandas/io/json/_json.py in _write(self, obj, orient, double_precision, ensure_ascii, date_unit, iso_dates, default_handler, indent)
    196             iso_dates,
    197             default_handler,
--> 198             indent,
    199         )
    200 

/usr/local/lib/python3.7/dist-packages/pandas/io/json/_json.py in _write(self, obj, orient, double_precision, ensure_ascii, date_unit, iso_dates, default_handler, indent)
    164             iso_dates=iso_dates,
    165             default_handler=default_handler,
--> 166             indent=indent,
    167         )
    168 

OverflowError: Maximum recursion level reached

How to display confidence key in json?

Hi, Sir Thank you for this solution it helped me a lot I want to add confidence against every bounding box. Also when I import confidence in txt files from yolov5 like "0 0.1 0.3 0.5 0.6 0.83" the last figure is confidence the script gives an error of packing extra parameters.

Conversion from coco to YoloV5 fails

importer.ImportCoco(path_to_annotations).ExportToYoloV5()
AttributeError: 'Dataset' object has no attribute 'ExportToYoloV5'

Does PyLabel support polygon annotation ?

Hello,

Does PyLabel convert coco-json to Yolov5 PyTorch txt format ? I have annotated some image using makesense.ai with the following Coco-JSON format

{"info":{"description":"my-project-name"},"images":[{"id":1,"width":3072,"height":4096,"file_name":"IMG20221222113909.jpg"}],"annotations":[{"id":0,"iscrowd":0,"image_id":1,"segmentation":[[1328.987654320987,1200.987654320987,1284.7407407407402,1229.4320987654314,1249.9753086419746,1257.876543209876,1164.6419753086413,1295.8024691358019,1088.7901234567896,1298.9629629629624,1050.8641975308637,1295.8024691358019,1060.3456790123453,1352.6913580246908,1249.9753086419746,1333.7283950617277,1344.7901234567894,1270.518518518518]],"bbox":[1050.8641975308637,1200.987654320987,293.92592592592564,151.7037037037037],"area":17625.098613016293},{"id":1,"iscrowd":0,"image_id":1,"segmentation":[[1543.9012345679005,1191.5061728395056,1490.172839506172,1213.629629629629,1464.8888888888882,1248.3950617283945,1417.4814814814808,1298.9629629629624,1385.876543209876,1321.0864197530857,1370.0740740740735,1324.2469135802462,1379.555555555555,1406.4197530864192,1569.1851851851845,1384.2962962962956,1727.209876543209,1479.1111111111104,1761.9753086419744,1428.5432098765425,1676.6419753086411,1336.8888888888882,1594.4691358024684,1223.1111111111106]],"bbox":[1370.0740740740735,1191.5061728395056,391.90123456790093,287.60493827160485],"area":51686.638012498006}],"categories":[{"id":1,"name":"Hanger"}]}

Can I convert this to Yolov5 PyTorch txt format ?

Thank you.

vscode compatibility

First of all, this is a very nice tool, especially the labeling aid! Unfortunately I can only get it working on colab. If I try to use the labeling aid on vscode, it does not work but also does not throw any errors. It just fails silently. I made sure that all jupyter settings are configured correctly.

It's possible to import YOLO segmentation dataset?

Hello, i was trying to import a yolo dataset with segmentation as annotation but i get error. Is planned this feature?

Thanks

UnboundLocalError: local variable 'categories' referenced before assignment

dataset.labeler.StartPyLaber(new_classes=classes)
after labelling trying to save in coco format.

Update Readme examples

importer.ImportCoco(Annot).ExportToYoloV5()

does not work.

importer.ImportCoco(Annot).export.ExportToYoloV5()

should be listed instead to make it easier for new users to get started.

cat_id_index for yolov5 to coco format

I am using ImportYoloV5 to convert yolo format label to coco format.
My label only has one classe yoloclasses =( "face")
But I am not sure the the setting of cat_id_index in yolov2coco.ipynb.
The default value is cat_id_index =1, but should not the value start from 0 ?

Thanks

Only 1 label is imported as a result of importer.ImportYoloV5

When YOLO annotations are imported and a picture has more than 1 annotation (e.g., 2021-07-03T06-33-53-frame_0000.jpeg file in Squirrels and Nuts dataset used in the sample Jupyter Notebook), the amount of imported annotations is correct, but all of them represent the last annotation from the corresponding .txt file.

Environment: Python 3.9, pylabel 0.1.34

Example:

content of 2021-07-03T06-33-53-frame_0000.txt file:

1 0.7305 0.8444 0.0505 0.1167
1 0.8003 0.8255 0.0547 0.0917
1 0.8784 0.7824 0.0401 0.0981

content of img_df corresponding to the annotation file:

Solution: I analyzed the case, and I can see that the fix should be to replace line 374 in importer.py (the line is d[row_id] = row) with d[row_id] = dict(row).

YOLOv5 class index starts from 1

As per the official YOLOv5 repository, class labels are indexed from 0 but the example script for coco2yolov5.ipynb generates label texts starting from 1.
Here is what I have:

Number of images: 1547
Number of classes: 5
Classes:['swimmer', 'boat', 'jetski', 'life_saving_appliances', 'buoy']
Class counts:
swimmer 6206
boat 2214
buoy 560
life_saving_appliances 330
jetski 320

the yaml file looks like this:

names:
- swimmer
- boat
- jetski
- life_saving_appliances
- buoy
nc: 5
path: ..
train: training/images
val: training/images

see the image below, the image has only 'swimmer' and 'boat'. so class '0' and '1'
but the annotation file looks like this:

85.txt

2 0.4557 0.0825 0.0138 0.0364
2 0.5967 0.5397 0.0146 0.0332
2 0.7602 0.4009 0.0260 0.0236
2 0.6354 0.3676 0.0171 0.0364
1 0.7069 0.3532 0.0122 0.0139
1 0.6837 0.3601 0.0114 0.0150