borutb-fri / fmld Goto Github PK
View Code? Open in Web Editor NEWA challenging, in the wild dataset for experimentation with face masks with 63,072 face images.
License: MIT License
A challenging, in the wild dataset for experimentation with face masks with 63,072 face images.
License: MIT License
issue 1. I'm using tensorflow object detection API and I receive an error with your dataset because of an unwanted class label in the data. error happens when using the tf records generating script as follows
python Tensorflow/scripts/generate_tfrecord.py -x Tensorflow/workspace/images/FMLD_cleaned_dataset/test -l Tensorflow/workspace/annotations/label_map.pbtxt -o Tensorflow/workspace/annotations/test.record -c Tensorflow/workspace/annotations/csv_train.csv
File "Tensorflow/scripts/generate_tfrecord.py", line 101, in class_text_to_int
return label_map_dict[row_label]
KeyError: 'invalid_face'
I used my own python script to make a 1->1 mapping from your provided "FMLD_annotations" folder with xml files as the labels, into the full dataset of combined WIDER and MAFA datasets containing all the images. The mapping was done on the basis of identifying the matching filenames of the xml files and the jpg images. As I dont have matlab on my own computer.
How many classes does the dataset contain, it seems to be more than these three, which I put into my labelmap.pbtxt in tensorflow
with mask (name: masked_face),
without mask (name: unmasked_face) and
with mask worn incorrectly (name: incorrectly_masked_face).
issue 2. Does your dataset have bounding box coordinates in the precision of "even numbered floats" because I saw that the bounding box coordinates are given as float instead of the typical int, so that a banker's round shouldnt be necessary?
many coordinates were given like 919.0
issue 3. This was something that I noticed when inspecting the WIDERFACE dataset images. That you sourced some of your dataset from. There was some really obscene and bizarre category of "car accident" and "streetfight" images in the WIDERFACE dataset. I just wanted to make a note about it that those could be less than ethical images in terms of research. I deleted those categories of pictures from my dataset.
issue 1: These was not necessarily the only bad data I found. But it was what I found from my dataset when I inspected the csv file, after having gotten crash from illegal bounding box outside image dimensions.
File "C:\Users\lauri\anaconda3\envs\tf_kaggle\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 154, in Assert
raise errors.InvalidArgumentError(
tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected 'tf.Tensor(False, shape=(), dtype=bool)' to be true. Summarized data: b'maximum box coordinate value is larger than 1.100000: '
1.1067415
The ones I found from my subset of your FMLD dataset with sketchy bbox coordinates were these in csv
filename
test_00000749.jpg
test_00001127.jpg
test_00001586.jpg
test_00001607.jpg
test_00001626.jpg
test_00001626.jpg
test_00003030.jpg
test_00003055.jpg
test_00003672.jpg
test_00003750.jpg
test_00003774.jpg
test_00004007.jpg
test_00004179.jpg
ย
filename
train_00000019.jpg
INVALID_COORDINATES_excelsheet_test_train.xlsx
Issue 2: I found Hitler the dataset. I think he was maybe hiding in the WIDERFACE.
picture name
0_Parade_Parade_0_914.jpg
But I managed to delete him from my dataset as well :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.