Coder Social home page Coder Social logo

fairface's People

Contributors

bernardo1998 avatar dchen236 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fairface's Issues

Model benchmarks

Hey!
Incredibly good dataset. Do you have any benchmarks/metrics/evaluations of the model on the FF dataset for age, ethnicity, gender?

Hosting images on Google drive prevents programmatic access

Hello!

Thank you for this important dataset.

I'm trying to programmatically download the image .zip folders from your Google drive links, but because they are >50Mb, Google Drive doesn't allow programmatic download access to the links (because it wants to prompt the user about a virus warning). I've searched and tried many workarounds for this issue with Google Drive, but none of them worked for me.

To avoid this, I suggest you use a different file hosting provider, or create a GitHub 'release' on this repo, which will let you upload large binary files that can then be programmatically downloaded.

As an interim fix, I've mirrored the Padding=0.25 files with a release at this public repo.

Thank you,

service_test meaning

In the labeled data for training and validation, I noticed a service_test column. I might have missed it, but I couldn't find information on this in the paper or the repository. What is this variable referencing? Is it the data used for testing the "classification accuracy of commercial services"?

Thanks!

Error when loading model to machine

Hi there,

I got an error when I run the following line in the predict_age_gender_race function:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I am using a machine with 1.1 GHz Dual-Core Intel Core m3 processor. Please kindly advice how to resolve.

don't match bboxes per every iamge

As the bboxes appended in a list then, after call "predidct_age_gender_race" function the indices of bboxes don't match with origin images.
For this I am added bellow line in third line of "predidct_age_gender_race" function after img_names = [os.path.join(imgs_path, x) for x in os.listdir(imgs_path)]

img_names.sort(key=lambda x: os.path.getmtime(x))

Thus, images are considered based on time that bboxes have been added.

Many incorrectly labeled images

The paper reads "We further refined the annotations by training a model from the initial ground truth annotations and applying back to the dataset. We then manually re-verified the annotations for images whose annotations differ from model predictions."

But I tried to do this check myself, and I found many examples of annotations that do not match the model predictions. I do not have an estimate for how many images have this problem. But here are a few samples from the validation dataset that are annotated as "White", along with the model's predictions for these images. I would guess 2-3 of these 64 people would self-identify as "White":

Filename Predicted Race
val/1914.jpg Indian
val/5302.jpg Black
val/8590.jpg Southeast Asian
val/8963.jpg Black
val/9763.jpg Southeast Asian
val/9377.jpg Southeast Asian
val/7653.jpg Southeast Asian
val/2173.jpg Black
val/6261.jpg Indian
val/2698.jpg Black
val/322.jpg Southeast Asian
val/7489.jpg East Asian
val/2865.jpg Black
val/7394.jpg Southeast Asian
val/6331.jpg Black
val/8906.jpg East Asian
val/3797.jpg East Asian
val/5689.jpg Latino_Hispanic
val/7191.jpg East Asian
val/1312.jpg Indian
val/1399.jpg Indian
val/5204.jpg Latino_Hispanic
val/1758.jpg East Asian
val/7019.jpg Black
val/5771.jpg Southeast Asian
val/3903.jpg Latino_Hispanic
val/3204.jpg Middle Eastern
val/10556.jpg Latino_Hispanic
val/8838.jpg Southeast Asian
val/9757.jpg Latino_Hispanic
val/6590.jpg Latino_Hispanic
val/144.jpg Southeast Asian
val/10507.jpg Latino_Hispanic
val/1554.jpg Latino_Hispanic
val/7518.jpg Indian
val/5563.jpg Black
val/209.jpg Indian
val/10349.jpg Latino_Hispanic
val/8969.jpg Black
val/8475.jpg Black
val/5485.jpg Latino_Hispanic
val/4649.jpg Latino_Hispanic
val/68.jpg Southeast Asian
val/1286.jpg East Asian
val/2777.jpg Latino_Hispanic
val/397.jpg Latino_Hispanic
val/6448.jpg East Asian
val/1173.jpg Indian
val/10222.jpg Southeast Asian
val/4156.jpg Southeast Asian
val/7783.jpg Latino_Hispanic
val/10794.jpg East Asian
val/1309.jpg Latino_Hispanic
val/5787.jpg East Asian
val/9198.jpg Latino_Hispanic
val/4890.jpg Southeast Asian
val/6822.jpg Latino_Hispanic
val/8659.jpg Latino_Hispanic
val/953.jpg East Asian
val/10843.jpg Latino_Hispanic
val/8177.jpg East Asian
val/8870.jpg Latino_Hispanic
val/9399.jpg Latino_Hispanic
val/3652.jpg Latino_Hispanic

Process is automatically killed while processing a large dataset

First, thanks for sharing this fantastic tool.

I am trying to process a large dataset (more than 50K images) but when it seems there is some condition that kills the process if I am consuming too much resources.

image

I just could mask 4K from 50K of my dataset, do you know what is happening?

It seems I will have too use my server.

Thanks in advance.

Accuracy difference between train and val datasets and paper

Out of curiosity, we took the 0.25 datasets (train and val) and run those through the ResNet34 model and weights trained by the authors.
This 0.25 dataset is the one that the face align part in predict.py creates (so we assume it is equivalent to the one used and referred in the paper for 7 race classes).

Interestingly, the accuracy (match between labels in the original csv and predicted classes by the published model) differs between the train and val(test) datasets and both are lower than presented in the paper:
Train:

  • full set 85% (73386 / 86744)
  • service_test == True 84% (33794 / 40252)
    Val:
  • full set 78% (8511 / 10954)
  • service_test == True 77% (3955 / 5162)

BTW, as it was previously discovered by fellow commenters, the filter service_test == True defines a subset where the labels are balanced in terms of race and gender. Therefore, we calculated metrics both for the full set and this subset.

We would have expected higher and consistent percentages.

  • The paper presents comparison tables where the accuracy of the model is 94%, which is not supported by these findings.
  • The drop in validation accuracy might suggest that the ResNet34 model (a simple one with fc head replacement) was trained on this dataset but evidently does not perform as the published results.
  • BTW, if one looks deeper into the images: some are very challenging (low res, profile, back of the head, low light etc). Whether and what it means regarding the data quality and balance of the dataset (level, consistency, distribution etc.) needs further consideration. For example, if lower quality images (the term TBD) are present in higher proportion within one class than within the others that can make the dataset out of balance (regardless what the label distribution suggests, because in that case some labels hold less or confusing information concentrated in a specific class). We did not perform such analysis just flag the potential here.

Please feel free to correct any inaccuracy or misinterpretation above or provide an explanation.

Pretrained model on different datasets

Hi! I'm interested in your paper and would like to reproduce some cross-dataset analysis. The paper compares the performance of models trained on UTKFace, LFWA+, and CelebA. Could you please release the pre-trained models on these datasets, in addition to FairFace dataset? Thank you so much.

4 races definition and training

Thank you for your work. May I know how did you define the 4-races and how did you train it?

In the paper, it seems like you merged white and middle eastern into white, and merged east asian and southeast asian into asian, and then train the model on these 4 classes. However, the last fc layer of your pretrained model still predicts 18 classes (

model_fair_4.fc = nn.Linear(model_fair_4.fc.in_features, 18)
) and you only use the first 4 values for race prediction (
race_outputs = outputs[:4]
).
So how do you define these pretrained fc layer and how did your merge the races?

And moreover, how do you define the races groups in table 3&4 in the arxiv paper and how did you conduct the experiments? Thank you!

Help in annotations for age

Hi,
I have a question regarding the annotations. I've noticed that for certain images, the reported age appears as highlighted: in the format of dates.
image
In such cases, should these images be excluded from the training process? I would greatly appreciate any guidance or suggestions on how to address this situation.
Thank you

The pretrained model is not recognized

Hi there. Thanks for the great work! I followed the installation guide and tried to run python3 predict_bbox.py --csv test_imgs.csv, but it tells me FileNotFoundError: [Errno 2] No such file or directory: 'fair_face_models/fairface_alldata_20191111.pt'.

I did download the models and put them in the fair_face_models folder as required. Is there another model I should download instead?

Thanks!

Training configuration

Hello, I'm interested in the training configuration, or seeing the training code.

  • How many epochs did you train?
  • What batch size did you use?
  • Was the Adam learning rate fixed at 0.0001 or did you use a learning rate schedule?
  • Did you use any augmentation during training?
  • Did you weight the different losses equally (race, gender, age)?
  • Did you weight the loss in different classes to account for class imbalance?
  • Did you train the final dense layers with dropout, or with the default torchvision resnet34 architecture?
  • Did you unfreeze/make trainable all the resnet34 layers from the beginning, or only train the dense layer first? Did you unfreeze the resnet in blocks or all at once?

Information related to training

Hi I hope you are doing well,

I have been trying to work on something like this and I was wondering if you could provide more information related to your training performance. I have been facing difficulties getting a decent accuracy using the classifier.

Any information related to the number of epochs, batch size, learning rate and training loss would be appreciated.

Low validation accuracy 71% for race estimation

When I use the pretrained model to predict race on the validation set, I get the following accuracy:

Accuracy Category
75.54% White
86.05% Black
59.33% Latino_Hispanic
78.00% East Asian
62.26% Southeast Asian
73.02% Indian
61.79% Middle Eastern
70.43% Non-white
71.40% All

This is very different from the accuracy reported in the paper. On the held-out datasets you report 81% average in Table 6.

This 10% difference makes me think I'm doing something wrong, or that the held-out datasets are not comparable to the validation dataset.

Here is my code:

#!/usr/bin/env python
# coding: utf-8

from torchvision import transforms, models
import torch.nn as nn
import torch
from PIL import Image
import numpy as np

with open('data/fairface_label_val.csv') as f:
    data = f.read().splitlines() # split rows
data = [row.split(',') for row in data]
data = data[1:] # drop the header
fn_data, age_data, gender_data, race_data, _ = zip(*data) # unpack into columns

# convert from race names to race indices
races_names = ['White','Black','Latino_Hispanic','East Asian','Southeast Asian','Indian','Middle Eastern']
race_indices = [races_names.index(race_name) for race_name in race_data]
race_indices = np.asarray(race_indices)

model = models.resnet34(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 18)
model.load_state_dict(torch.load('data/fair_face_models/res34_fair_align_multi_7_20190809.pt'))
model = model.to('cuda')

trans = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

def chunks(x, n):
    for i in range(0, len(x), n):
        yield x[i:i+n]

batch_size = 256
all_pred = []

for fn_batch in chunks(fn_data, batch_size):
    print('.', end='')
    
    images = [Image.open('data/padding-0.25/' + fn) for fn in fn_batch]
    images = [trans(image) for image in images]
    images = torch.stack(images).to('cuda')
    
    with torch.no_grad():
        outputs = model(images)
        pred = outputs[:,:7].argmax(-1).cpu().detach().numpy()
        
    all_pred.extend(pred)
    
all_pred = np.asarray(all_pred)

matching = all_pred == race_indices

for i, race_name in enumerate(races_names):
    accuracy = matching[race_indices==i].mean()
    print(f'{100*accuracy:05.2f}%\t{race_name}')
    
accuracy = matching[race_indices>0].mean()
print(f'{100*accuracy:05.2f}%\tNon-white')

accuracy = matching.mean()
print(f'{100*accuracy:05.2f}%\tAll')

Information regarding training/valid performance

Hi! I hope you are doing well.

I have been searching for face attribute classifier and your one is the best I have found. I just wonder if it is possible for you to share some information regarding the dateset you used for training/validation and their corresponding performance (accuracy)?

Any related information would be much appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.