Light

microsoft / digiface1m Goto Github PK

View Code? Open in Web Editor NEW

257.0 8.0 23.0 8.08 MB

License: Other

digiface1m's Introduction

DigiFace-1M Dataset

The DigiFace-1M dataset is a collection of over one million diverse synthetic face images for face recognition.

It was introduced in our paper DigiFace-1M: 1 Million Digital Face Images for Face Recognition and can be used to train deep learning models for facial recognition.

The dataset contains:

720K images with 10K identities (72 images per identity). For each identity, 4 different sets of accessories are sampled and 18 images are rendered for each set.
500K images with 100K identities (5 images per identity). For each identity, only one set of accessories is sampled.

The DigiFace-1M dataset can be used for non-commercial research, and is licensed under the license found in LICENSE.

Downloading the Dataset

For convenience the dataset is split into 8 parts which can be downloaded here:

72 images per identity

P1
P2
P3
P4
P5

5 images per identity

P1
P2
P3

Dataset Layout

The DigiFace-1M dataset contains cropped color images in the following layout.

subj_id_n
├── 0.png                 # First rendered image of subject subj_id_n
├── 1.png                 # Second rendered image of subject subj_id_n
...
├── k.png                 # k+1 rendered image of subject subj_id_n

Disclaimer

Some of our rendered faces may be close in appearance to the faces of real people. Any such similarity is naturally unintentional, as it would be in a dataset of real images, where people may appear similar to others unknown to them.

Citation

If you use the DigiFace-1M dataset in your work, please cite the following paper:

@inproceedings{bae2023digiface1m,
  title={DigiFace-1M: 1 Million Digital Face Images for Face Recognition},
  author={Bae, Gwangbin and de La Gorce, Martin and Baltru{\v{s}}aitis, Tadas and Hewitt, Charlie and Chen, Dong and Valentin, Julien and Cipolla, Roberto and Shen, Jingjing},
  booktitle={2023 IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2023},
  organization={IEEE}
}

digiface1m's People

Contributors

Stargazers

Watchers

digiface1m's Issues

Share on Hugging Face

Hello ! I was wondering if you'd like to share the dataset on Hugging Face to share it with the research community ? This way researchers will be able to load the dataset with one line of code to train models :)

There are already lots of datasets on https://huggingface.co/datasets for computer vision and you could add it with the other datasets here: https://huggingface.co/microsoft

2% of MS1MV2

Dear authors, thank you for your amazing work! I have a quick question: in the paper, how did you choose the 2% of MS1MV2 for the SX+Real best training? Thank you in advance!

Landmarks

Will landmarks or semantic segmenation for these faces be released?

Ethical Contradiction MS1M

If the purpose of this research is avoid using non-consensual imagery, is it not a contradiction to continue using such data in research for fine-tuning?

Could you release pre-trained model weight and trainging code?

Hi,
Thanks for releasing this amazing dataset. Could you share the pre-trained model weight and training code?
Besides, could you give more details about your data augmentation? For example, the kernel size of gaussian blur.

BR,
Ziv

Is the pose of the images available?

hi there,

Thanks for releasing this awesome dataset. I'm wondering if the head pose of each image is available as well? It would be very helpful for different kinds of research.

Thanks.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.