dchen236 / fairface Goto Github PK

View Code? Open in Web Editor NEW

420.0 420.0 87.0 14.24 MB

Python 100.00%

fairface's People

Contributors

Stargazers

Watchers

Forkers

pilotbear php1301 liuguoyou salina-ds rvin-z chengjiun zengyh1900 mumer92 leesangha norawebbwilliams bernardo1998 areias himario annstrange dorghamko fuze23333 stefaniestoppel miastay jinwook-shim duwizerak yoavhacohen amanbots npor markscq yldcs jiazhi412 datduong ghoul-ipg abebabirhane jinwangc pxzhang94 jie311 yossibiton hwijune peter1125 jbhoffman613 suke0 serahjane ajb-work gelarehh skyisnotwarm baishiruyue picekl brokenchang hong-bo jeanphilpetitfrere codepharmer vinid milanlproc tiwariayan inhyeokjeon cacko vikt0riia moileehyeji techthiyanes arijul2 panashe1812 quyjleo yangyzju dremovd pikayue11 oluwabajio lioprd gaulle2005 gone2808 elisaxchen princetonssplab xhl-learn samee-shahood jinyangli01 chxclyde lieverj chenshiren168 conroywhitney gdcooper yiminglin-ai wanlo choco-bloop zcfrank1st boragocode okpb666 qixin-act xxmiprai

fairface's Issues

What is the license of this repository

Hi,

Since there isn't any LICENSE file provided, could you tell what is the license of this repo?

Thanks!

code to train

is there code to train and reproduce results?

Model benchmarks

Hey!
Incredibly good dataset. Do you have any benchmarks/metrics/evaluations of the model on the FF dataset for age, ethnicity, gender?

Hosting images on Google drive prevents programmatic access

Hello!

Thank you for this important dataset.

I'm trying to programmatically download the image .zip folders from your Google drive links, but because they are >50Mb, Google Drive doesn't allow programmatic download access to the links (because it wants to prompt the user about a virus warning). I've searched and tried many workarounds for this issue with Google Drive, but none of them worked for me.

To avoid this, I suggest you use a different file hosting provider, or create a GitHub 'release' on this repo, which will let you upload large binary files that can then be programmatically downloaded.

As an interim fix, I've mirrored the Padding=0.25 files with a release at this public repo.

Thank you,

service_test meaning

In the labeled data for training and validation, I noticed a service_test column. I might have missed it, but I couldn't find information on this in the paper or the repository. What is this variable referencing? Is it the data used for testing the "classification accuracy of commercial services"?

Thanks!

Error when loading model to machine

Hi there,

I got an error when I run the following line in the predict_age_gender_race function:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I am using a machine with 1.1 GHz Dual-Core Intel Core m3 processor. Please kindly advice how to resolve.

Original Images

Can we reach the unaligned original images?

don't match bboxes per every iamge

As the bboxes appended in a list then, after call "predidct_age_gender_race" function the indices of bboxes don't match with origin images.
For this I am added bellow line in third line of "predidct_age_gender_race" function after img_names = [os.path.join(imgs_path, x) for x in os.listdir(imgs_path)]

img_names.sort(key=lambda x: os.path.getmtime(x))

Thus, images are considered based on time that bboxes have been added.

Many incorrectly labeled images

The paper reads "We further refined the annotations by training a model from the initial ground truth annotations and applying back to the dataset. We then manually re-verified the annotations for images whose annotations differ from model predictions."

But I tried to do this check myself, and I found many examples of annotations that do not match the model predictions. I do not have an estimate for how many images have this problem. But here are a few samples from the validation dataset that are annotated as "White", along with the model's predictions for these images. I would guess 2-3 of these 64 people would self-identify as "White":

Filename	Predicted Race
val/1914.jpg	Indian
val/5302.jpg	Black
val/8590.jpg	Southeast Asian
val/8963.jpg	Black
val/9763.jpg	Southeast Asian
val/9377.jpg	Southeast Asian
val/7653.jpg	Southeast Asian
val/2173.jpg	Black
val/6261.jpg	Indian
val/2698.jpg	Black
val/322.jpg	Southeast Asian
val/7489.jpg	East Asian
val/2865.jpg	Black
val/7394.jpg	Southeast Asian
val/6331.jpg	Black
val/8906.jpg	East Asian
val/3797.jpg	East Asian
val/5689.jpg	Latino_Hispanic
val/7191.jpg	East Asian
val/1312.jpg	Indian
val/1399.jpg	Indian
val/5204.jpg	Latino_Hispanic
val/1758.jpg	East Asian
val/7019.jpg	Black
val/5771.jpg	Southeast Asian
val/3903.jpg	Latino_Hispanic
val/3204.jpg	Middle Eastern
val/10556.jpg	Latino_Hispanic
val/8838.jpg	Southeast Asian
val/9757.jpg	Latino_Hispanic
val/6590.jpg	Latino_Hispanic
val/144.jpg	Southeast Asian
val/10507.jpg	Latino_Hispanic
val/1554.jpg	Latino_Hispanic
val/7518.jpg	Indian
val/5563.jpg	Black
val/209.jpg	Indian
val/10349.jpg	Latino_Hispanic
val/8969.jpg	Black
val/8475.jpg	Black
val/5485.jpg	Latino_Hispanic
val/4649.jpg	Latino_Hispanic
val/68.jpg	Southeast Asian
val/1286.jpg	East Asian
val/2777.jpg	Latino_Hispanic
val/397.jpg	Latino_Hispanic
val/6448.jpg	East Asian
val/1173.jpg	Indian
val/10222.jpg	Southeast Asian
val/4156.jpg	Southeast Asian
val/7783.jpg	Latino_Hispanic
val/10794.jpg	East Asian
val/1309.jpg	Latino_Hispanic
val/5787.jpg	East Asian
val/9198.jpg	Latino_Hispanic
val/4890.jpg	Southeast Asian
val/6822.jpg	Latino_Hispanic
val/8659.jpg	Latino_Hispanic
val/953.jpg	East Asian
val/10843.jpg	Latino_Hispanic
val/8177.jpg	East Asian
val/8870.jpg	Latino_Hispanic
val/9399.jpg	Latino_Hispanic
val/3652.jpg	Latino_Hispanic

Process is automatically killed while processing a large dataset

First, thanks for sharing this fantastic tool.

I am trying to process a large dataset (more than 50K images) but when it seems there is some condition that kills the process if I am consuming too much resources.

I just could mask 4K from 50K of my dataset, do you know what is happening?

It seems I will have too use my server.

Thanks in advance.

Can you tell me the specific pytorch version information?

Good Job！
Can you tell me the specific pytorch version information?
thank you very much!

can we detect multiple faces ?

can we detect multiple faces contained in 1 image, & also how to do detection on videos?

Accuracy difference between train and val datasets and paper

Out of curiosity, we took the 0.25 datasets (train and val) and run those through the ResNet34 model and weights trained by the authors.
This 0.25 dataset is the one that the face align part in predict.py creates (so we assume it is equivalent to the one used and referred in the paper for 7 race classes).

Interestingly, the accuracy (match between labels in the original csv and predicted classes by the published model) differs between the train and val(test) datasets and both are lower than presented in the paper:
Train:

full set 85% (73386 / 86744)
service_test == True 84% (33794 / 40252)
Val:
full set 78% (8511 / 10954)
service_test == True 77% (3955 / 5162)

BTW, as it was previously discovered by fellow commenters, the filter service_test == True defines a subset where the labels are balanced in terms of race and gender. Therefore, we calculated metrics both for the full set and this subset.

We would have expected higher and consistent percentages.

The paper presents comparison tables where the accuracy of the model is 94%, which is not supported by these findings.
The drop in validation accuracy might suggest that the ResNet34 model (a simple one with fc head replacement) was trained on this dataset but evidently does not perform as the published results.
BTW, if one looks deeper into the images: some are very challenging (low res, profile, back of the head, low light etc). Whether and what it means regarding the data quality and balance of the dataset (level, consistency, distribution etc.) needs further consideration. For example, if lower quality images (the term TBD) are present in higher proportion within one class than within the others that can make the dataset out of balance (regardless what the label distribution suggests, because in that case some labels hold less or confusing information concentrated in a specific class). We did not perform such analysis just flag the potential here.

Please feel free to correct any inaccuracy or misinterpretation above or provide an explanation.

Pretrained model on different datasets

Hi! I'm interested in your paper and would like to reproduce some cross-dataset analysis. The paper compares the performance of models trained on UTKFace, LFWA+, and CelebA. Could you please release the pre-trained models on these datasets, in addition to FairFace dataset? Thank you so much.

typo in predidct_age_gender_race function name

"predidct_age_gender_race" function name has an extra 'd' between 'i' and 'c' in the word 'predict'

4 races definition and training

Thank you for your work. May I know how did you define the 4-races and how did you train it?

In the paper, it seems like you merged white and middle eastern into white, and merged east asian and southeast asian into asian, and then train the model on these 4 classes. However, the last fc layer of your pretrained model still predicts 18 classes (

FairFace/predict.py

Line 71 in 74b4f93

model_fair_4.fc = nn.Linear(model_fair_4.fc.in_features, 18)

) and you only use the first 4 values for race prediction (

FairFace/predict.py

Line 134 in 74b4f93

race_outputs = outputs[:4]

).
So how do you define these pretrained fc layer and how did your merge the races?

And moreover, how do you define the races groups in table 3&4 in the arxiv paper and how did you conduct the experiments? Thank you!

links to labels files are broken

I couldn't access the labels file :
train https://drive.google.com/file/d/1i1L3Yqwaio7YSOCj7ftgk8ZZchPG7dmH/view
val https://drive.google.com/file/d/1wOdja-ezstMEp81tX1a-EYkFebev4h7D/view
Did you take them down ?

Could you upload the pretrained model?

I cannot find the pretrained model here.

Could you upload the pretrain model?

Thank you so much!

Help in annotations for age

Hi,
I have a question regarding the annotations. I've noticed that for certain images, the reported age appears as highlighted: in the format of dates.

In such cases, should these images be excluded from the training process? I would greatly appreciate any guidance or suggestions on how to address this situation.
Thank you

The pretrained model is not recognized

Hi there. Thanks for the great work! I followed the installation guide and tried to run python3 predict_bbox.py --csv test_imgs.csv, but it tells me FileNotFoundError: [Errno 2] No such file or directory: 'fair_face_models/fairface_alldata_20191111.pt'.

I did download the models and put them in the fair_face_models folder as required. Is there another model I should download instead?

Thanks!

Training configuration

Hello, I'm interested in the training configuration, or seeing the training code.

How many epochs did you train?
What batch size did you use?
Was the Adam learning rate fixed at 0.0001 or did you use a learning rate schedule?
Did you use any augmentation during training?
Did you weight the different losses equally (race, gender, age)?
Did you weight the loss in different classes to account for class imbalance?
Did you train the final dense layers with dropout, or with the default torchvision resnet34 architecture?
Did you unfreeze/make trainable all the resnet34 layers from the beginning, or only train the dense layer first? Did you unfreeze the resnet in blocks or all at once?

Information related to training

Hi I hope you are doing well,

I have been trying to work on something like this and I was wondering if you could provide more information related to your training performance. I have been facing difficulties getting a decent accuracy using the classifier.

Any information related to the number of epochs, batch size, learning rate and training loss would be appreciated.

Low validation accuracy 71% for race estimation

When I use the pretrained model to predict race on the validation set, I get the following accuracy:

Accuracy	Category
75.54%	White
86.05%	Black
59.33%	Latino_Hispanic
78.00%	East Asian
62.26%	Southeast Asian
73.02%	Indian
61.79%	Middle Eastern
70.43%	Non-white
71.40%	All

This is very different from the accuracy reported in the paper. On the held-out datasets you report 81% average in Table 6.

This 10% difference makes me think I'm doing something wrong, or that the held-out datasets are not comparable to the validation dataset.

Here is my code:

#!/usr/bin/env python
# coding: utf-8

from torchvision import transforms, models
import torch.nn as nn
import torch
from PIL import Image
import numpy as np

with open('data/fairface_label_val.csv') as f:
    data = f.read().splitlines() # split rows
data = [row.split(',') for row in data]
data = data[1:] # drop the header
fn_data, age_data, gender_data, race_data, _ = zip(*data) # unpack into columns

# convert from race names to race indices
races_names = ['White','Black','Latino_Hispanic','East Asian','Southeast Asian','Indian','Middle Eastern']
race_indices = [races_names.index(race_name) for race_name in race_data]
race_indices = np.asarray(race_indices)

model = models.resnet34(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 18)
model.load_state_dict(torch.load('data/fair_face_models/res34_fair_align_multi_7_20190809.pt'))
model = model.to('cuda')

trans = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

def chunks(x, n):
    for i in range(0, len(x), n):
        yield x[i:i+n]

batch_size = 256
all_pred = []

for fn_batch in chunks(fn_data, batch_size):
    print('.', end='')
    
    images = [Image.open('data/padding-0.25/' + fn) for fn in fn_batch]
    images = [trans(image) for image in images]
    images = torch.stack(images).to('cuda')
    
    with torch.no_grad():
        outputs = model(images)
        pred = outputs[:,:7].argmax(-1).cpu().detach().numpy()
        
    all_pred.extend(pred)
    
all_pred = np.asarray(all_pred)

matching = all_pred == race_indices

for i, race_name in enumerate(races_names):
    accuracy = matching[race_indices==i].mean()
    print(f'{100*accuracy:05.2f}%\t{race_name}')
    
accuracy = matching[race_indices>0].mean()
print(f'{100*accuracy:05.2f}%\tNon-white')

accuracy = matching.mean()
print(f'{100*accuracy:05.2f}%\tAll')

Information regarding training/valid performance

Hi! I hope you are doing well.

I have been searching for face attribute classifier and your one is the best I have found. I just wonder if it is possible for you to share some information regarding the dateset you used for training/validation and their corresponding performance (accuracy)?

Any related information would be much appreciated!

dchen236 / fairface Goto Github PK

fairface's People

Contributors

Stargazers

Watchers

Forkers

fairface's Issues

Recommend Projects

Recommend Topics

Recommend Org