Comments (4)
Actually, it seems that some of the descriptions (e.g., 'basketball') are indeed related to distinct concepts while others (e.g., 'alfa romeo giulietta') seem to describe the same thing.
from dataset.
I agree. Let me check, if it's a good time to do with the Google team.
from dataset.
Hi @chcomin,
thank you for the find. You're correct, there are two classes of errors:
-
Different entities, same description. Like,
/m/020lf, /m/04rmv, mouse
In one case, it's a computer mouse, and in the other case, it's an animal.For this kinds of collisions, I would propose to make the descriptions more verbose. Like "mouse" -> "computer mouse", "mouse" -> "mouse (animal)". Feel free to make a pull request, and I will try to advocate for its acceptance.
-
Same real entities, same description, different ids (like "alfa romeo giulietta'). A short term fix would be to modify labels so that these entities also have the same images attached. Eventually, there shall be chosen a winner, but I don't have enough information to give an informed advice here.
from dataset.
Checking the test images here http://openimages.oldjpg.com/, I see that sometimes the duplicates classes are actually the same (e.g. egg) but other times no. For example "mouse" as already said, but also "fish" (one is the animal and the other one is food).
Please notice that the 3 "alfa romeo giulietta" are:
- New Alfa Rometo Giulietta
- Old Alfa Romeo Giulietta
- A mix of the two
So resolving all the duplicates would be a useful work, but we have to check all the classes, a simple merge could be wrong.
from dataset.
Related Issues (20)
- OpenImages V6 data set HOT 1
- there are no cat and dog coarse-grain category. HOT 1
- Image 01a624308e2f8c5d in oidv6-train-annotations-bbox.csv is mislabled
- Mislabeled Images HOT 1
- segmentations.csv mask 3 coordinates HOT 1
- Decoding Openimages v6 mask coordinates HOT 2
- BadZipFile Error HOT 3
- Soil-dataset
- How to show bounding boxes in classification result? HOT 1
- How do you train this model? HOT 2
- OIDv4 Tool Kit Windows 10 Python 3.7 HOT 2
- Extended dataset download per category? HOT 1
- (V5) Mismatched image and mask resolutions. HOT 2
- Explore UI does not load images HOT 2
- How to report invalid/questionable images? HOT 5
- Open Image Dataset V5 to COCO JSON format
- Why not build a video instance segmentation dataset?
- Where can I download the OpenImage V2 dataset? HOT 1
- Hierarchy question
- Request to add pretrained large-scale object detector to "Community Contributions" HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataset.