cavalleria / cavaface Goto Github PK
View Code? Open in Web Editor NEWface recognition training project(pytorch)
License: MIT License
face recognition training project(pytorch)
License: MIT License
This is a great job. Will the code for knowledge distillation and model compression be provided later? Will this project continue to update the latest face recognition technology? If so, that would be great. Thanks to the author.
the backbone vargfacenet you can try it!
when running
r = requests.post("http://127.0.0.1:%d/eval"%(eval_info["args"].port), data=eval_info).json()
Response [404], so .json() returns
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
how to solve this?
when i use data pre-process 'CutMix' ,An error occurred like “name 'mixed_x' is not defined” .
the method 'cutmix_data' return "mixed_x, y_a, y_b, lam" mixed_x' is not defined
I downloaded the checkpoint from MODEL_ZOO but it fails to load the model.
I load the checkpoint as follows, but it shows the error: ModuleNotFoundError: No module named '__torch__'
BACKBONE.load_state_dict(torch.load('IR_SE_100_Combined_Epoch_24.pth'))
Hi,
It seems like there is a difference in the CircleLoss implementation comparing to the original implementation. In your implementation, the loss is computed like this: log(sum(exp(logits_p))*sum(exp(logits_n)))
https://github.com/cavalleria/cavaface.pytorch/blob/90b8a1c2d689552b5d7e5c703c02c583dfce1a6c/head/metrics.py#L585
But the original formula is: log(1+sum(exp(logits_p))*sum(exp(logits_n)))
https://github.com/qianjinhao/circle-loss/blob/master/circle_loss.py#L38
Would improve the input speed.
train data:
glint360Datasets(https://github.com/deepinsight/insightface/blob/master/recognition/partial_fc/README.md);
1、 First,train baseline model with R50-Softmax
2、Second,Train ArcFace-Softmax by finetune the backbone using Step1 Backbone model . the acc is lower then origin softmax;
Hi,
I tested the released GhostNet_x1.3 on LFW, CFP-FP and AgeDB-30. It's weird that the model gets much better results on CFP-FP and AgeDB-30 than LFW, have you tried these testing sets? What's the gap?
Thanks.
LFW: Acc 0.850 @ Threshold 0.461
CFP-FP: Acc 0.943 @ Threshold 0.179
AgeDB-30: Acc 0.973 @ Threshold 0.231
I have 4 GPUS. when I write GPU=[0,1,2,3] in the config.py, it will report errors. But when i write GPU=[0,1]. it works well. That's strange
I prepared the data directory according to the eval_megaface.py. After evaluation process is over, I got an error:
Traceback (most recent call last):
File "/home/ici/.conda/envs/cavaface/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "evaluate_service.py", line 109, in run
evaluator.parse_results_into_file()
File "/home/ici/cavaface/cavaface.pytorch/evaluation/eval_megaface.py", line 406, in parse_results_into_file
result_dict = _load_json_result_file(result_json_file)
File "/home/ici/cavaface/cavaface.pytorch/evaluation/utils/io.py", line 86, in _load_json_result_file
with open(json_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'model/megaface_50/cmc_facescrub_megaface_retina_1000000_1.json'
The "cmc_facescrub_megaface_retina_1000000_1.json" didn't generate after evaluation. And there is no result file.
When training with SV-X-Softmax, the following error occurs:
File "/home/imagus/dev/cavaface.pytorch/head/metrics.py", line 452, in forward
if self.xtype == 'MV-AM':
File "/home/imagus/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
type(self).__name__, name))
AttributeError: 'SVXSoftmax' object has no attribute 'xtype'
It appears in the list of options and so python tries to load it and complains the class doesn't exist. I can't see the file for it so I think you forgot to add it.
I download the /IJB_release.tar from https://github.com/deepinsight/insightface/tree/master/Evaluation. But I didn't find the "/ijbc_112x112" folder.
When changing ArcFace to: ArcNegFace, CurriculumFace (didn't try others), I get this:
File "/home/imagus/.local/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 932, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/imagus/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 2317, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/imagus/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1535, in log_softmax
ret = input.log_softmax(dim)
AttributeError: 'tuple' object has no attribute 'log_softmax'
Pytorch: 1.5.0 (could be issue with this?)
thanks for sharing such a great work! here i have some questions about the results and training details:
looking forward to your reply. thank you.
Hi, can you provide the flask version. when I run the evaluate_server.py it reports:
Traceback (most recent call last):
File "evaluate_service.py", line 12, in
import flask
File "/usr/local/lib64/python3.6/site-packages/flask/init.py", line 21, in
from .app import Flask
File "/usr/local/lib64/python3.6/site-packages/flask/app.py", line 69, in
from .wrappers import Request
File "/usr/local/lib64/python3.6/site-packages/flask/wrappers.py", line 14, in
from werkzeug.wrappers.json import JSONMixin as _JSONMixin
ModuleNotFoundError: No module named 'werkzeug.wrappers.json'; 'werkzeug.wrappers' is not a package
请问这个项目停止更新了吗?
Traceback (most recent call last):
File "./main.py", line 8, in
from infer import get_infer
File "/home/ici/cavaface/cavaface.pytorch/evaluation/infer/init.py", line 1, in
from citrus_pytorch_infer import CitrusPytorchInfer
ModuleNotFoundError: No module named 'citrus_pytorch_infer'
Thanks for your great work!
Did you train/test all these on RGB images? I have a task to train/test on gray scale images, do you think i will suffer from performance degradation in case I train from scratch with all your default settings? Any suggestions? Thank you!
Hi, I followed your instruction to train a model with MS1M training set, and I ended up achieving 99+@lfw, 95+@cfp, 95+@agedb. This pretrained model also achieved 97+ on my own testing set.
Then I used this pretrained model of mine to do transfer learning on my own training set, which is quite different from MS1M/LFW/CFP/AgeDB.... When I finished the tranfer learning, the final model achieved 99+ on my own testing set, but only 80+@lfw, 70+@cfp, 70+@agedb.
I wonder if I made any mistakes? Is there any ways to imporve the performance on my own testing set while remaining fair enough good performance on other testing sets?
Thank you for your time.
In the README, you use Backone not using Backbone
AttributeError: module 'torch.distributed' has no attribute 'init_process_group'
The test result(megaface rankn and roc) of Circle loss is so bad. I trained it by efficient-b0. but when I test it on megaface. the result is so bad. Do you ever trained it? if yes. can you provice the pretrained model? ths
deepglint has more than 2x more identities. Looking forward to trained models on deepglint.
Thanks for your working! I downloaded ms1m-retinaface training data from the link you shared.I saw your datasets.py required separate image files to open but there is no such file in ms1m-retinaface-t1. I'm confused,how can I use .rec,.lst&.idx file with your training code?
I tried to use RandAugment as well as the RandErasing augmentation but whenever I start the training I get the error: "EOFError: Ran out of input"
Full output follows:
============================================================
/usr/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 11 leaked semaphores to clean up at shutdown
len(cache))
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 113, in _main
preparation_data = reduction.pickle.load(from_parent)
EOFError: Ran out of input
/usr/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 11 leaked semaphores to clean up at shutdown
len(cache))
Traceback (most recent call last):
File "train.py", line 333, in <module>
main()
File "train.py", line 58, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, cfg))
File "/home/imagus/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/imagus/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/imagus/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 108, in join
(error_index, name)
Exception: process 2 terminated with signal SIGSEGV
Hi, Do you use the pretrained model when train EfficientNet_b1. when I train it without pretrained model, the top1 and top5 ACC is always 0.
@cavalleria Hello, I have observed that there is ‘Circle-Loss’ in the introduction, but not in the code. Is it because of poor performance?Thanks!
min gpu free mem: 8000000000.0 B
min gpu free mem: 8000000000.0 B
min gpu free mem: 102000000 B
min gpu free mem: 162000000 B
Finish loading model /home/vision_rd/face_Recognition/models/GhostNet_Arcface/model/Epoch_24_Time_2020-07-21-13-27_checkpoint.pth, infer with shape: (198, 3, 112, 112)
Loading model time cost: 43.907608 seconds.
Extract on megaface...
Noisy faces of scrub: 605
Noisy faces of gallery: 707
Begin to extract embedding of scrub faces...
Finish Load path of faces: 0/3530
begin thread
Segmentation fault: 11
Stack trace:
[bt] (0) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x41a8280) [0x7f6276fd8280]
[bt] (1) /lib64/libc.so.6(+0x363b0) [0x7f635a6483b0]
[bt] (2) /lib64/libc.so.6(cfree+0x1c) [0x7f635a697ecc]
[bt] (3) /usr/lib64/python3.6/site-packages/cv2/cv2.cpython-36m-x86_64-linux-gnu.so(+0x4eda39) [0x7f62b277da39]
[bt] (4) /usr/lib64/python3.6/site-packages/cv2/cv2.cpython-36m-x86_64-linux-gnu.so(+0x168cd5) [0x7f62b23f8cd5]
[bt] (5) /lib64/libpython3.6m.so.1.0(_PyCFunction_FastCallDict+0x147) [0x7f635b3ea167]
[bt] (6) /lib64/libpython3.6m.so.1.0(+0x1507df) [0x7f635b4557df]
[bt] (7) /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x3a7) [0x7f635b44a0f7]
[bt] (8) /lib64/libpython3.6m.so.1.0(+0x14f987) [0x7f635b454987]
Eval model: /home/vision_rd/yangwenbo/face_Recognition/models/GhostNet_Arcface/model/Epoch_24_Time_2020-07-21-13-27_checkpoint.pth,24, done!
I doubt why I only changed the network model(has been trained). it report errors.
when i run Circle loss, it reports errors like bellow: Do you run it successfully?
Traceback (most recent call last):
File "train.py", line 388, in
main()
File "train.py", line 60, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, cfg))
File "/usr/local/lib64/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/usr/local/lib64/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib64/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/vision_rd/face_Recognition/cavaface.pytorch_bake/train.py", line 291, in main_worker
outputs = head(features, labels)
File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/vision_rd/face_Recognition/cavaface.pytorch_bake/head/metrics.py", line 590, in forward
output = torch.logsumexp(logit_n, dim=1) + torch.logsumexp(logit_p, dim=1)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
I download train dateset as "For training data, please download the ms1m-retinaface in https://github.com/deepinsight/insightface/tree/master/iccv19-challenge."
but not found ms1m-retinaface-t1-clean.txt, i got this files:
-rw-r--r-- 1 shengyang root 73M Apr 9 2019 agedb_30.bin
-rw-r--r-- 1 shengyang root 73M Apr 1 2019 cfp_fp.bin
-rw-r--r-- 1 shengyang root 63M Apr 1 2019 lfw.bin
-rw-r--r-- 1 shengyang root 14 Apr 1 2019 property
-rw-r--r-- 1 shengyang root 98M Apr 1 2019 train.idx
-rw-r--r-- 1 shengyang root 412M Apr 1 2019 train.lst
-rw-r--r-- 1 shengyang root 28G Apr 1 2019 train.rec
how to make train dataset? thank you!
Thank you for this work. I want to use the pre-trained model of AttentionNet-IRSE-56/92 from the MODEL_ZOO.md for fine-tuning. Where can I get the pre-trained model?
Hi, big thanks for your great work, 6666!
Have you tried rexnet based models? If yes, could u share the results? Thanks!
HI, the repo is a nice work, thanks for your sharing.
I want to know if these augmentation methods are effective,
like the RandomErasing/Mixup/RandAugment/Cutout/CutMix?
Can you please tell what are the specs of the system that you use for training models and also how time does it take to train a fresh MobileFaceNet, IR-SE-100 model on ms1m-retinanet dataset ?
Hi, you mentioned early in issue #40 that you slightly cleaned the MS1M-RetinaFace dataset.
Wondering if you could share your clean list? It would be great helpful, thank you!
I haven't seen this issue before in similar pytorch training scenarios.
I can normally do batch size of 256, but when resuming, I must do 224.
It seems like some memory from loading the resumed model is never freed.
Edit: I resolved the issue by adding in: del(checkpoint)
torch1.2 run 'train.sh',when loss.backforward. raise error like this:
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 5]], which is output 0 of ClampBackward, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)
and i use with torch.autograd.set_detect_anomaly(True): to locate the problem ,find that ''target_logit = cos_theta_1[torch.arange(0, embbedings.size(0)), label].view(-1, 1)'' raise error .
is the torch version problem? Thanks
When running inference on AttentionNet, I get two tensors back.
The first one is the tensor I am interested in (1 x 512) and the second one is one I am not interested in (7 x 7 x ?).
It should improve performance if this second tensor is not returned in inference mode.
return out, conv_out
to:
if train:
return out, conv_out
else:
return out
By the way, I'm very impressed with the performance of this AttentionNet-56!
CFP_FP Acc: 0.9822857142857142, AgeDB Acc: 0.9819999999999999, VGG2_FP Acc: 0.9554
CFP_FP Acc: 0.9831428571428571, AgeDB Acc: 0.9814999999999999, VGG2_FP Acc: 0.9538
when i read the function run()(line 94) in class readThread (threading.Thread)(line 85) of citrus_base_infer.py. I doubt why there is no normalization before sending the img to the network structure: As far as I know, there is such an operation in training
def run(self):
global signal_stop
unfinished = True
while not signal_stop and unfinished:
try:
image_path, outpath = self.in_q.get(timeout=1)
out_img = []
img = self.read_func(image_path, self.shape[0]==1)
if len(img.shape) == 3:
img = img[:,:,::-1] #to rgb
img = np.transpose( img, (2,0,1) )
else:
img = img[np.newaxis,:]
attempts = [0,1] if self.is_flip else [0]
Hope your replay. Thx
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.