escvm / oidv4_toolkit Goto Github PK

View Code? Open in Web Editor NEW

804.0 804.0 633.0 34.09 MB

Download and visualize single or multiple classes from the huge Open Images v4 dataset

License: GNU General Public License v3.0

Python 100.00%

oidv4_toolkit's People

Contributors

Stargazers

Watchers

Forkers

keldrom quantron zuenko berkerlogoglu georg-w skylion007 jurjsorinliviu daoyijushi mbaroody jillelajitta akshai lxy5513 rajaskakodkar musicbeer shiyongde e-leaf heonedream beardedambivert rongyan236 francisobiagwu balak4 srihari-palivela tsok-xyz atrisaxena arruda davidko3 jogiji benjydel sidpatki dennistang742 xiaojinu chris-mh-wu jmast daicoolb ppjhang shooter2062424 yu-jingrui erikfather nguyendieuhienk15 lucaspedro jcr179 taksau venkataramansubramanian sl07h rajatmodi62 julixquid cdleong cchinnaraj lijuny amruthaajay omarsayedmostafa pkusnail calvin4471 wswday dreadlord1984 lewstherin511 adewin oanaucs deaplearn peace-zy mondher-bouazizi pythonlessons monocongo wnitbrrt yhsmiley elavin11 kelvinto05 giegiey vescarecrow eduguiu andreasmarxer alisher-ai victor1600 iamweiweishi othmane-kada rajashekary pmin91 thekindler jilee3 k4droid3 blackandrose rajasekhar06 shivam0403 myounus96 thiagodangelo theaiguyscode doutdex jensanderer horse007666 jokermachtsachen saptechengineer github2016-yuan sunn-e kediarahul135 manhminno manjeet87 chuanche-huang-bose naveen-dodda spandan09 khuongphp

oidv4_toolkit's Issues

No action when running command

I tried to use the script but when I run any command besides "python main.py -h", nothing happens, no error, nothing. New line appears immediately after. Has anyone encountered something like this?
I installed all the requirements btw

visualize the labeled images does not work

!python3 OIDv4_ToolKit/main.py visualizer

does not work on Google colab.
Error: : cannot connect to X server

Multi-word class names

When label files are created it would be nice if multi-word class names like "adhesive tape" and "brown bear" be in quotes or space gets replaced with underscore. Otherwise it's little bit problematic to process those files.

PS.
The readme suggests using underscore in classes file but such classes (e.g. adhesive_tape) aren't found. But it happily accepts natural names so i don't know what's going on.

Please update the read me file as in the command it is written download not downloader

download multiclass images

I want to know how I can download many classes images together, such as 1000 classes. The parameter "class" how can i set?

Annotation bounding box coordinates with invalid values?

I have downloaded images and label files for an image class, and I'm wondering if maybe I've done something to monkey with the bounding box coordinates in the label *.txt files.

For example, I have a label file (0a7df07bbac03159.txt) with the following contents:

Sword 14.72 7.68 669.44 767.360256

My understanding from the README is that the annotation bounding box coordinates should be within the normalized range [0, 1], but that's obviously not what I'm getting in my label files, as seen above.

Can anyone comment as to how I should interpret the bounding box values in the *.tx files, and/or what I've done wrong to get values that appear to be outside the expected range? Are the float values present in these files computed from the original/normalized bounding box values against the height/width of the corresponding image, and if I want to have the actual integer pixel numbers for the bounding box then I can just round these to the nearest integer? For example, the bounding box for the above would be (left_x=15, top_y=8, right_x=669, bottom_y=767)?

Thanks in advance for any comments or suggestions.

Add option to default prompts to "Yes"

First of all thanks for making this application available, it's quite helpful in my work creating datasets for inputs when training object detection models.

I would like to call this application from within a script. This is a bit tricky at the moment because I get various prompts asking me if I want to download missing files. This doesn't always occur (I'm not sure why) but when it does it requires keyboard input (I always choose yes). The prompt I'm seeing may be coming from here.

I would like to run this module's main.py script in a non-interactive mode where it always downloads any missing files without confirmation from the user. I don't see an option to disable confirmation prompts.

If someone can advise as to where to modify this code to allow for this then I am happy to make the changes myself and submit a PR once I've verified that it's working. My suggestion is to add a command line option such as --yes to disable confirmation, such as what's available on the command line for conda (as an example).

Thanks in advance for any assistance with this issue.

Couldn't download images

Since OID has migrated from oidv4 to oidv5 the aws is not working thereby the images are not getting downloaded please migrate this toolkit from oidv4 to oidv5 as soon as possible

Problem with classes containing a space

Then class name contain space download directory also contain space.
So "aws s3 cp" command fails cause unable to parse path.

Sample: class_name=Vehicle registration plate
[INFO] Downloading Vehicle registration plate.
----------Vehicle registration plate----------
[INFO] Downloading all images.
[INFO] Found 3944 online images for train.
[INFO] Download of 3876 images in train.
0%| | 0/3876 [00:00<?, ?it/s]
Unknown options: registration,plate
0%| | 1/3876 [00:01<1:16:07, 1.18s/it]
Unknown options: registration,plate

I create small merge request, thats that fixes this problem:
#2

download() method crashes program for casting NoneType

Download method has misplaced indent in master

Misplaced indent causes the following

  File "main.py", line 38, in <module>
    bounding_boxes_images(args, DEFAULT_OID_DIR)
  File "/home/dkendall/Documents/senior_design/tools/OIDv4_ToolKit/modules/bounding_boxes.py", line 89, in bounding_boxes_images
    download(args, df_val, folder[i], dataset_dir, class_name, class_code, threads = int(args.n_threads))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

downloading.py:88

						if not args.n_threads:
 							download(args, df_val, folder[i], dataset_dir, class_name, class_code)
					else:
						download(args, df_val, folder[i], dataset_dir, class_name, class_code, threads = int(args.n_threads))'''

Proposed change

						if not args.n_threads:
 							download(args, df_val, folder[i], dataset_dir, class_name, class_code)
						else:
						        download(args, df_val, folder[i], dataset_dir, class_name, class_code, threads = int(args.n_threads))

ValueError: not enough values to unpack (expected 2, got 0)

While downloading the images on windows machine it is throwing the below traceback:

Traceback (most recent call last):
  File "main.py", line 36, in <module>
    bounding_boxes_images(args, DEFAULT_OID_DIR)
  File "E:\python\experiment\OIDv4_ToolKit\modules\bounding_boxes.py", line 87, in bounding_boxes_images
    download(args, df_val, folder[i], dataset_dir, class_name, class_code)
  File "E:\python\experiment\OIDv4_ToolKit\modules\downloader.py", line 21, in download
    rows, columns = os.popen('stty size', 'r').read().split()
ValueError: not enough values to unpack (expected 2, got 0)

[ERROR] | Missing the class-descriptions-boxable.csv file.

Hi,

I want to download Man, Woman data and using this line:
python main.py downloader --classes Man Woman --multiclasses 1 --type_csv all

I get the error Missing class-description-boxable.csv and train-annotations-bbox.csv.

I assume train-annotations-bbox.csv is an important file and thus would be required. Any solution to this?

OSError: [WinError 6] The handle is invalid

Command: python main.py downloader --classes Shirt --type_csv all

StackTrace:

[DOWNLOAD] | File train-annotations-bbox.csv downloaded into OID\csv_folder\trai n-annotations-bbox.csv. Traceback (most recent call last): File "D:\Projects\Product Tagging\tensorflow1\addons\OIDv4_ToolKit\modules\dow nloader.py", line 25, in download columns, rows = os.get_terminal_size(0) OSError: [WinError 6] The handle is invalid During handling of the above exception, another exception occurred: Traceback (most recent call last): File "main.py", line 36, in bounding_boxes_images(args, DEFAULT_OID_DIR) File "D:\Projects\Product Tagging\tensorflow1\addons\OIDv4_ToolKit\modules\bou nding_boxes.py", line 87, in bounding_boxes_images download(args, df_val, folder[i], dataset_dir, class_name, class_code) File "D:\Projects\Product Tagging\tensorflow1\addons\OIDv4_ToolKit\modules\dow nloader.py", line 27, in download columns, rows = os.get_terminal_size(1) OSError: [WinError 6] The handle is invalid

wrong txt format for classes with underscore

python main.py downloader --classes Human_face --type_csv 'validation' --multiclasses 1

The txt label files look like the following:
Human face 272.903578 214.58227200000002 448.06849 435.37612800000005

According to the documentation there should be a underscore between human and face.

However, in order to accomodate a more intuitive representation and give the maximum flexibility, every .txt annotation is made like:

name_of_the_class left top right bottom

This also causes the OID_to_yolo_gist to fail

how to stop it?

The tool is downloading. But i wanna stop it. I was pressed Ctrl+C anytime but i can interrupt.
what need i do?

Unable to download from OIDv5

Hi!
With the new version of OID it is impossible to download the images, thinking that this memory error is due to this incompatibility?

WIN10
PYTHON 3.7.4

[INFO] | Downloading Handgun. Traceback (most recent call last): File "main.py", line 37, in <module> bounding_boxes_images(args, DEFAULT_OID_DIR) File "C:\Users\aless\Desktop\YOLOv3_GUN\Tool\OIDv4_ToolKit-master\modules\bounding_boxes.py", line 60, in bounding_boxes_images df_val = TTV(csv_dir, name_file, args.yes) File "C:\Users\aless\Desktop\YOLOv3_GUN\Tool\OIDv4_ToolKit-master\modules\csv_downloader.py", line 21, in TTV df_val = pd.read_csv(CSV) File "C:\Users\aless\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\parsers.py", line 685, in parser_f return _read(filepath_or_buffer, kwds) File "C:\Users\aless\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\parsers.py", line 463, in _read data = parser.read(nrows) File "C:\Users\aless\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\parsers.py", line 1154, in read ret = self._engine.read(nrows) File "C:\Users\aless\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\parsers.py", line 2059, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 2165, in pandas._libs.parsers._concatenate_chunks File "<__array_function__ internals>", line 6, in concatenate MemoryError: Unable to allocate array with shape (14610229,) and data type object

IndexError: index 0 is out of bounds for axis 0 with size 0

Glenns-iMac:OIDv4_ToolKit glennjocher$ python3 main.py downloader --classes knife kitchen_knife --type_csv train

                   ___   _____  ______            _    _    
                 .'   `.|_   _||_   _ `.         | |  | |   
                /  .-.  \ | |    | | `. \ _   __ | |__| |_  
                | |   | | | |    | |  | |[ \ [  ]|____   _| 
                \  `-'  /_| |_  _| |_.' / \ \/ /     _| |_  
                 `.___.'|_____||______.'   \__/     |_____|
        

             _____                    _                 _             
            (____ \                  | |               | |            
             _   \ \ ___  _ _ _ ____ | | ___   ____  _ | | ____  ____ 
            | |   | / _ \| | | |  _ \| |/ _ \ / _  |/ || |/ _  )/ ___)
            | |__/ / |_| | | | | | | | | |_| ( ( | ( (_| ( (/ /| |    
            |_____/ \___/ \____|_| |_|_|\___/ \_||_|\____|\____)_|    
                                                          
        
    [INFO] | Downloading knife.
   [ERROR] | Missing the class-descriptions-boxable.csv file.
[DOWNLOAD] | Do you want to download the missing file? [Y/n] Y
...145%, 0 MB, 9540 KB/s, 0 seconds passed
[DOWNLOAD] | File class-descriptions-boxable.csv downloaded into OID/csv_folder/class-descriptions-boxable.csv.
Traceback (most recent call last):
  File "main.py", line 37, in <module>
    bounding_boxes_images(args, DEFAULT_OID_DIR)
  File "/Users/glennjocher/PycharmProjects/OIDv4_ToolKit/modules/bounding_boxes.py", line 56, in bounding_boxes_images
    class_code = df_classes.loc[df_classes[1] == class_name].values[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

instance segmentation

Thank you for your wonderful work. The open Images Dataset V5 was just released, which contained instance segmentation. Would you mind update the package to support download of instance mask?

Thank you again for your work.

cvs_downloader: urlretrieve returns 403 forbidden when calling save().

I've added a print(FILE_URL) just to check the url we're trying to retrieve. At first I suspected that the missing User-Agent header might be the problem, but adding it didn't make any difference

    opener = urllib.request.build_opener()
    opener.addheaders = [('User-Agent','Mozilla/5.0')]
    urllib.request.install_opener(opener)

    urllib.request.urlretrieve(url, filename, reporthook)

Here's the error output when trying to download the missing CSV.

[DOWNLOAD] Do you want to download the missing file? [Y/n] y
https://storage.googleapis.com/openimages/2018_04/test\test-annotations-bbox.csv
Traceback (most recent call last):
  File "main.py", line 97, in <module>
    df_val = TTV(csv_dir, name_file)
  File "C:\wa\OIDv4_ToolKit\modules\csv_downloader.py", line 18, in TTV
    error_csv(name_file, csv_dir)
  File "C:\wa\OIDv4_ToolKit\modules\csv_downloader.py", line 43, in error_csv
    save(FILE_URL, FILE_PATH)
  File "C:\wa\OIDv4_ToolKit\modules\csv_downloader.py", line 57, in save
    urllib.request.urlretrieve(url, filename, reporthook)
  File "c:\users\march\appdata\local\programs\python\python37\Lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "c:\users\march\appdata\local\programs\python\python37\Lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "c:\users\march\appdata\local\programs\python\python37\Lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "c:\users\march\appdata\local\programs\python\python37\Lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "c:\users\march\appdata\local\programs\python\python37\Lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "c:\users\march\appdata\local\programs\python\python37\Lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "c:\users\march\appdata\local\programs\python\python37\Lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Create custom model for 230.000 images using darknet

I have jpg files that contains 230.000 car plate images for graduate project and i'm new for this topic.
Can i convert this images to yolo format using this repo ?

Crop out the bounding box

How can i get the part of the image which is only in the bounding box? So just the bounding box, not the surrounding of it.

Thank you!

IndexError: index 0 is out of bounds for axis 0 with size 0

Hi, thanks for the work!

I am facing the following issue:

python main.py downloader --classes backpack --type_csv all

[INFO] Downloading backpack.
Traceback (most recent call last):
File "main.py", line 77, in
class_code = df_classes[df_classes[1] == class_name].values[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

Any help?

Thanks

Typo in README file in command line

Hi Vittorio. Thanks for this cool tool, very helpful. Just want to notice that you have typo in your README file, when you give the command to download or visualize. Instead of using command downloader we should use download. The same with visualizer.

Support for different annotation file format, such as Pascal VOC

Hello, you tool is great and very useful.
I would like to use the downloaded dataset with tensorflow so I have to build a TFRecord file from it. However, before start training, I think I am going to modify/add some labels via the labelImg tool so I firstly need to build xml files in Pascal VOC format. It looks like it is not that hard to obtain them from your txt files but I am wondering whether you consider including this file format as output of you toolkit (I am a ML newbie so I hope I am not asking something obvious...).

Image level download for positive labels only

Hi guys,
Love this library, it's super helpful and I've been using it to download images for my research project. I was downloading image level label images related to alcohol and noticed that an image will get downloaded into the directory corresponding to a label even if the label is negative for that image. ie an image where Beer=1 and Wine=0 in the train-annotations-human-imagelabels.csv will get downloaded in both the Beer and Wine directories. Is there a way to download images only into directories that correspond to positive labels?

Register and upload to PyPI

I'd like to get the package installed into my Python virtual environment from PyPI via something along the lines of pip install oid_toolkit.

I may be able to cook this up myself and submit a PR once it's done, but if the developers in charge can give me any guidance on making that happen then please advise.

BTW (in case this helps) when I've done this before for projects of my own I've followed the relevant Real Python tutorial:

Get the setup.py file in shape
$ rm -rf dist
$ python setup.py sdist bdist_wheel
$ twine check dist/*
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/*
$ twine upload dist/*

What format are the labels in?

What format are the labels in?
I'm seeing this:
Man 141.44 149.02449 636.16 684.36021
Man 414.08 117.68437 807.68 684.36021
Man 629.76 449.63126 1023.36 684.36021
Is is class x1 y1 x2 y2
or what...?

OIDv5?

Hello!

Any plans to release a version for OIDv5 anytime soon?

How long does it takes to download single class images?

I started to download 'Person' class train images like 14 hours ago and it still go for 75% completed. I just wonder how long does it takes for others.
I'm not very familiar with data issues so not that sure what could I do for speed accelerate.
Can I use GPU to make download faster? Or is it normal speed to others too?
Btw thanks for the toolkit. It really helps me.

IndexError: index 0 is out of bounds for axis 0 with size 0

Traceback (most recent call last):
  File "main.py", line 36, in <module>
    bounding_boxes_images(args, DEFAULT_OID_DIR)
  File "/mountdir/OIDv4_ToolKit/modules/bounding_boxes.py", line 56, in bounding_boxes_images
    class_code = df_classes.loc[df_classes[1] == class_name].values[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

fatal error: An error occurred (404) when calling the HeadObject

Following the guide in the ReadMe, after:

python3 main.py downloader_ill --sub m --classes Orange --type_csv train --limit 30

I get the error:

fatal error: An error occurred (404) when calling the HeadObject operation: Key "train/0d72ff3e2601d71c.jpg" does not exist

Remove pycache folder and .pyc files from repo

https://stackoverflow.com/questions/32110126/should-i-put-pyc-files-under-version-control

Missing n_threads argument causing a TypeError from download() function call

I have attempted to use this software for downloading a certain group of image classes ("Weapon").

I have used the following command:

python main.py --Dataset ~/data/openimages --classes Weapon --type_csv 'all' downloader

Once this starts working I was prompted to save missing files and then many messages indicating a missing aws command:

    [INFO] | Downloading Weapon.
   [ERROR] | Missing the class-descriptions-boxable.csv file.
[DOWNLOAD] | Do you want to download the missing file? [Y/n] Y
...145%, 0 MB, 1653 KB/s, 0 seconds passed
[DOWNLOAD] | File class-descriptions-boxable.csv downloaded into OID/csv_folder/class-descriptions-boxable.csv.
   [ERROR] | Missing the train-annotations-bbox.csv file.
[DOWNLOAD] | Do you want to download the missing file? [Y/n] Y
...100%, 1138 MB, 10685 KB/s, 109 seconds passed
[DOWNLOAD] | File train-annotations-bbox.csv downloaded into OID/csv_folder/train-annotations-bbox.csv.

-----------------------------------------------Weapon-----------------------------------------------
    [INFO] | Downloading all images.
    [INFO] | [INFO] Found 1646 online images for train.
    [INFO] | Download of 1646 images in train.
sh: 1: aws: not found
sh: 1: aws: not found
sh: 1: aws: not found
sh: 1: aws: not found
...

Finally I am seeing the following error:

    [INFO] | Done!
    [INFO] | Creating labels for Weapon of test.
    [INFO] | Labels creation completed.
Traceback (most recent call last):
  File "main.py", line 36, in <module>
    bounding_boxes_images(args, DEFAULT_OID_DIR)
  File "/home/james/git/OIDv4_ToolKit/modules/bounding_boxes.py", line 89, in bounding_boxes_images
    download(args, df_val, folder[i], dataset_dir, class_name, class_code, threads = int(args.n_threads))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Perhaps this is being caused by the n_threads argument not having a reasonable default argument? The help information shows that this value is 20 by default, maybe this should be verified?

In any event thanks for making this code available. Once I get it to work it seems that it will save me lots of time for collecting a sub-dataset from OpenImages.

Usage with v3 classes

Is there any way to download and label image classes from v3 version? I change csv_folder content with files from v3, OID url to https://storage.googleapis.com/openimages/2017_11/, but for example downloading Chimney images it outputs:
[INFO] Downloading Chimney.
----------Chimney----------
[INFO] Downloading train images.
[INFO] Found 0 online images for train.
[INFO] All images already downloaded.
[INFO] Creating labels for Chimney of train.
[INFO] Labels creation completed.

.gitignore doesn't appear to be valid

Rather than being a valid .gitignore the file in place appears to a page that was copied as HTML. Replace with a valid .gitignore, perhaps from the GitHub template provided for Python.

Support for Filtering Image-Level Labels

Hi, thanks for the great toolkit. Is it possible to download images based on image-level labels (19,995 classes), rather than the 600 boxable classes only? I have downloaded the csv needed.

How can I download images with more than one class and get all classes bounding boxes, in the corresponding notation text file ?

For example, if there is an image of a Person riding a Bicycle, I would like to get 2 bounding box data of the Person and the Bicycle.

Error: cound not connect to the endpoint URL

This toolkit looks very useful to extract parts of Open image!!
I work on windows 7 (sorry), python 3.6.
When i try to launch the basic command line you provide:
python main.py downloader --classes Apple Orange --type_csv validation

i get first a bunch of lines saying "File association not found for extension .py" , and in between,
for each image it tries to pick "fatal error: cound not commect to the endpoint URL: "https://open-images...."
At the end of the process, of course, i get empty folders where the pictures should be.

I don't know for the file association message, but for the second error, it seems to be because of my proxy. Is there an option to add in command line the proxy? that would be very useful. Like what we can see in other dataset downloader, --proxy "http:XXXXXXXX:port"

or where can i hardcode proxy information in the python files of the toolkit?

thanks

Getting Error While downloading multiple classes images

Thanks for great tool to download Open Images.
I am facing below error while downloading images of multiple classes.
Traceback (most recent call last): File "main.py", line 37, in <module> bounding_boxes_images(args, DEFAULT_OID_DIR) File "/Users/OpenImages/OID/OIDv4_ToolKit/modules/bounding_boxes.py", line 106, in bounding_boxes_images class_dict[class_name] = df_classes.loc[df_classes[1] == class_name].values[0][0] IndexError: index 0 is out of bounds for axis 0 with size 0

I tried two method where listing all classes as command line argument and passing a text file which contains classes as a command line argument. But facing same error.

Also searched previous issues who faced similar issues and tried suggested method but no luck
Issue 37
Issue 13

Commands I ran are:
1)python3 main.py downloader --classes Car Person Bicycle Taxi Truck Building Traffic_light Tree Traffic_sign Stop_sign Billboard Missle Motorcycle Van Tire Airplane Wheel Tank Stree_light Submarine --type_csv all --multiclasses 1 --limit 100

2)python3 main.py downloader --classes classes_custom.txt --type_csv all --multiclasses 1 --limit 100

contents in classes_custom.txt are:
Car
Person
Bicycle
Taxi
Truck
Building
Traffic light
Tree
Traffic sign
Stop sign
Billboard
Missle
Motorcycle
Van
Tire
Airplane
Wheel
Tank
Street light
Submarine

Can't download grape train set

I am able to download the grape test and validation sets using the command below.

python main.py downloader --classes Grape --type_csv validation
python main.py downloader --classes Grape --type_csv test

However, when i try and download the train set, the program hangs for a few seconds, and then fails with a memory error in pandas.
python main.py downloader --classes Grape --type_csv train
Note: I'm running python 3 on windows

Any help is appreciated, thanks.

Unable to parse the bounding boxes from train-annotations-bbox.csv file

Possibility to download license of file?

Hello I just tried your script, it works perfect, I would like to know if a posibility to download also the license of the file could be added.

Thanks

Fatal error using downloader_ill

I will check this issue but it's not due to us; it seems that the image 0d72ff3e2601d71c.jpg is present on the csv file but it's not on the OIDv4 server.

Originally posted by @keldrom in #30 (comment)

Hello, I would like to know if this is solved?

"chipper" option to cut out image chips using bounding boxes

Sometimes one does not want the entire image, only the part with the class of interest in it. For example, when training an image classifier or a GAN.

A crude implementation is shown below:

new file, modules/chip.py

import cv2
import os
import re
import numpy as np

class_list = []
flag = 0


def chip(class_name, download_dir, label_dir,total_images, index):
    '''
    '''

    global class_list

    if not os.listdir(download_dir)[index].endswith('.jpg'):
        index += 2
    img_file = os.listdir(download_dir)[index]
    current_image_path = str(os.path.join(download_dir, img_file))
    img = cv2.imread(current_image_path)
    file_name = str(img_file.split('.')[0]) + '.txt'
    file_path = os.path.join(label_dir, file_name)
    f = open(file_path, 'r')



    for idx, line in enumerate(f):
        print(f"f is {f}")
        print(f"current img is {current_image_path}")
        print(f"line is {line}")
        # each row in a file is class_name, XMin, YMix, XMax, YMax
        match_class_name = re.compile('^[a-zA-Z]+(\s+[a-zA-Z]+)*').match(line)
        class_name = line[:match_class_name.span()[1]]
        ax = line[match_class_name.span()[1]:].lstrip().rstrip().split(' ')
    # opencv top left bottom right

        if class_name not in class_list:
            class_list.append(class_name)

        xmin = int(float(ax[-4]))
        ymin = int(float(ax[-3]))
        xmax = int(float(ax[-2]))
        ymax = int(float(ax[-1]))

        roi = img[ymin:ymax, xmin:xmax]
        print(f"xmin, xmax, ymin, ymax = ({xmin}, {xmax}, {ymin}, {ymax})")
        chips_folder="chips/"
        img_chip = img[roi]
        chip_filename = os.path.splitext(os.path.basename(current_image_path))[0]+"_chip"+str(idx)+".jpg"
        print(f"chip filename is {chip_filename}")
        chip_path = os.path.join(chips_folder, chip_filename)
        print(f"chip_path is {chip_path}")
        cv2.imwrite(chip_path, roi)

Added to bounding_boxes.py:

an import statement at the top...

from modules.chip import chip

...and this section:

    elif args.command == "chipper":
        for image_dir in ["train", "test", "validation"]:
                class_image_dir = os.path.join(dataset_dir, image_dir)
                for class_name in os.listdir(class_image_dir):

                    download_dir = os.path.join(dataset_dir, image_dir, class_name)
                    label_dir = os.path.join(dataset_dir, image_dir, class_name, 'Label')
                    if not os.path.isdir(download_dir):
                        print("[ERROR] Images folder not found")
                        exit(1)
                    if not os.path.isdir(label_dir):
                        print("[ERROR] Labels folder not found")
                        exit(1)

                    index = 0


                    chip(class_name, download_dir, label_dir,len(os.listdir(download_dir))-1, index)

                    while True:
                        if index < (len(os.listdir(download_dir)) - 2):
                           index += 1
                           chip(class_name, download_dir, label_dir,len(os.listdir(download_dir))-1, index)

Feature Request: Kitti Labels Format?

Not so much a bug or an error report, but more of a feature request. If it's possible to have the labels in the kitti labels format? A reference exists here:

https://github.com/NVIDIA/DIGITS/tree/master/digits/extensions/data/objectDetection#label-format

How to download a class with two words?

Dear all,

I need to download classes with two words, e.g.: Human mouth, Human head, etc.
I try using space or _ (underscore) with no luck.
Thank you very much in advance.

Warmest Regards,
Suryadi

How to get all classes

Only downloads 13 images

I tried to use the software, but it only downloads 13 images, even though I set the limit to 500

Label data not downloaded

I do not know whether this is intended behavior, but the Label folder comes out empty after running the command

python main.py downloader --classes Dolphin Whale --sub h --type_csv train

I am need of these files so that I can convert to Pascal/VOC using this other tool.

(Just to be sure, I installed awscli and which aws outputs ~/.local/bin/aws, and I think that's intended.)

What could be failing?

I made YOLO annotations script

here
It create .txt yolo format annotation out from downloaded images
compatible with other visualizer e.g. labelImg, boobs

*now read .csv in small chunk so it wont crash my potato PC and re-arrange if condition to improve speed.

escvm / oidv4_toolkit Goto Github PK

oidv4_toolkit's People

Contributors

Stargazers

Watchers

Forkers

oidv4_toolkit's Issues

Recommend Projects

Recommend Topics

Recommend Org