Cortex Dataset
We only take large-scale open source data sets into our list.
Video
- Youtube8m :High-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities which comes with precomputed audio-visual features.
Translation
- WMT Translation Dataset: Machine Translation Dataset with different language.
Information Retrieval
Question Answering
- Stanford Question Answering Dataset (SQuAD). Question answering about Wikipedia articles.
- Deepmind Question Answering Corpus. Question answering about news articles from the Daily Mail.
- Amazon question/answer data. Question answering about Amazon products.
Speech Recognition
- TIMIT Acoustic-Phonetic Continuous Speech Corpus. Not free, but listed because of its wide use. Spoken American English and associated transcription.
- VoxForge. Project to build an open source database for speech recognition.
- LibriSpeech ASR corpus. Large collection of English audiobooks taken from LibriVox.
Audio Content Analysis
- Google AudioSet : Google's large-scale dataset of manually annotated audio events that consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos.
- ExtraSensory Dataset : Behavioral context recognition over 300k examples (minutes) from 60 users in wild.
- Lakh Pianoroll Dataset a collection of 174,154 multi-track piano-rolls derived from theLakh MIDI Dataset (LMD).
Document Summarization
- Legal Case Reports Data Set. A collection of 4 thousand legal cases and their summarization.
- TIPSTER Text Summarization Evaluation Conference Corpus. A collection of nearly 200 documents and their summaries.
- The AQUAINT Corpus of English News Text. Not free, but widely used. A corpus of news articles.
Depth Estimation
- NYU Depth Dataset V2 : a comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.
Image Classification
- CIFAR 10 & CIFAR 100
- ImageNet :Data set for large-scale image classification. ImageNet dataset is a benchmark for many deep learning research.
- PASCAL VOC
- Street View House Numbers (SVHN) Dataset : Digits in real-world, an enhanced edition of MNIST.
- MS COCO
- Visual Genome
- WebVision: Images crawled from the Flickr website and Google Images search. Large Scale with huge domain bias.
- AI Challenger-图像属性数据集
- AI Challenger-图像中文描述
Fine-Grained Visual Classification
- Fine-Grained Visual Classification of Aircraft (FGVC-Aircraft)
- iNaturalist Challenge at FGVC 2017 : Dataset with 5,089 species with extremely difficult for human to accurately classify due to their visual similarity.
- iMaterialist Challenge at FGVC 2017 : Dataset designed for automatic product recognition for very similar visual object.
Semantic Segmentation
- KITTI Data Set : This dataset interests are stereo, optical flow, visual odometry, 3D object detection and 3D tracking using a standard station wagon with two high-resolution color and grayscale video cameras.
Object Detection and Recognition
-
MS-COCO : MS-COCO dataset provided a dataset to solve recognition, segmentation and caption problem. The main purpose for this dataset is to push the progress of scene understanding.
MS-COCO has the following features.
-
segmentation
-
Recognition in context
-
Superpixel stuff segmentation
-
330K images (>200K labeled)
-
1.5 million object instances
-
80 object categories
-
91 stuff categories
-
5 captions per image
-
250,000 people with keypoints
-
-
Cityscapes : Large scale city space semantic, instance-wise, dense pixel annotations of 30 classes.
-
Google Open Images Dataset : Open Images is a dataset of ~9 million images that have been annotated with image-level labels and object bounding boxes.
-
CORe50 : A new Dataset and Benchmark for Continuous Object Recognition, it provided a possible entry point for on-line learning.
-
Caltech-256 : 256 object categories containing a total of 30607 images. and the SIFT10M is SIFT features of Caltech-256 dataset.
Facial recognition
- MegaFace : The largest publicly available facial recognition dataset.
- MSR Image Recognition Challenge : Large Scale(10M) celebrity face images for the top 100K celebrities, which can be used to train and evaluate both face identification and verification algorithms.
Facial Landmark
- 300-W :provides annotations for 3837 face images with 68 landmarks.
- 300-WV : 300 Videos in the Wild (300-VW) Facial Landmark Tracking
Scene Classification
- Place365 : the goal is to build a system for high-level visual understanding tasks.
- LSUN :Scene classification and multitasking assistance (room layout estimation, saliency prediction, etc.).
Face Verification by Age
- LAG: Large Age Gap : a dataset containing variations of age in the wild, with images ranging from child/young to adult/old. The dataset contains 3,828 images of 1,010 celebrities. For each identity at least one child/young image and one adult/old image are present.
- IMDB-WIKI : To the best of our knowledge this is the largest publicly available dataset of face images with gender and age labels for training. This work propose a method to solve the expectation of using single image to estimate the age of the figure in the image.
Stereo and 3D Reconstruction
- Middlebury Stereo Vision Page : provides several multi-frame stereo data sets for comparing the performance of stereo matching algorithms.
- Middlebury multi-view stereo (MVS) : a calibrated multi-view image dataset with registered ground truth 3D models for the comparison of MVS approaches.
- TUD MVS :provides 124 different scenes that were recorded in controlled laboratory environment.
Motion Tacking
- Real World Activity Recognition Dataset
- Heterogeneity Activity Recognition Dataset : dataset from Smartphones and Smartwatches is a dataset devised to benchmark human activity recognition algorithms (classification, automatic data segmentation, sensor fusion, feature extraction, etc.) in real-world contexts.
- THUMOS Challenge 2015 : Large video dataset for action classification.
- MOTChallenge: The Multiple Object Tracking Benchmark : Large Scale Multiple Object Tracking dataset with images and videos.
Key-Point
- MS-COCO Keypoint Detection Dataset
- CrowdHuman : A Benchmark for Detecting Human in a Crowd.
Speech Recognition
- LibriSpeech ASR corpus :Large-scale (1000 hours) corpus of read English speech
- VoxForge: Voice dataset with ACCENT. (Use to test robust)
- CHIME:A speech recognition challenge dataset containing ambient noise. The data set contains real, analog and clean voice recordings. Specifically, it includes nearly 9000 recordings of 4 speakers in 4 noisy environments. The analog data is combined with multiple environments and recorded in a noisy environment. The data.
- TED LIUM : TED Talks speech dataset, with 1495 TED recordings and speech manuscript.
- Spoken Language Dataset : speech samples of English, German and Spanish languages with equally balanced between languages, genders and speakers.
Environment Sound
- UrbanSound : Labeled sound recordings of sounds like air conditioners, car horns and children playing. This dataset provide an entry point for environment sound recognition.