Coder Social home page Coder Social logo

mobilefacenet_tf's People

Contributors

kant avatar sirius-ai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobilefacenet_tf's Issues

Wrong number of classes

Please amend the source code or README file. The default number of classes (85164) is lower that the default dataset's actual number of classes (MS1M-refine-v2, 85742), which leads to failure to train.

I've basically re-implemented the model to reach this conclusion. After randomly getting NaN losses from softmax, tracing the issue to the loss function and for a few days investigating numerical stability for large number of classes, I found out that I was using the wrong shape for my logits.

This is an obscure issue, especially since the two numbers are very close. Please fix :).

Error of pretrained model

The error is AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path' when I run the train_nets.py using the pre-trained model MobileFaceNet_9925_9680.pb.
#193 ckpt = tf.train.get_checkpoint_state(pretrained_model) output ckpt = None.

The command in terminal is:
python train_nets.py --eval_db_path //faces_emore --tfrecords_file_path //tfrecords --pretrained_model //MobileFaceNet_9925_9680.pb

How can I solve it?
Thanks

about loss

hey,you have done good job! But in my training process, the loss increase to about ±18 when an epoch ends and another begin, can you tell me why?

accuracy is 0

hi:
the problem is accuracy is always 0 when i applying train_net.py,meanwhile,the loss is decreasing and accuracy on lfw is incresing.
i think that i applyed the one you revised at num_class..look at the error below:

class_number:85742(MS1M-V2)

epoch 0, total_step 1250, total loss is 42.33 , inference loss is 42.01, reg_loss is 0.32, training accuracy is 0.000000, time 323.625 samples/sec
epoch 0, total_step 1300, total loss is 42.36 , inference loss is 42.03, reg_loss is 0.32, training accuracy is 0.000000, time 319.744 samples/sec
epoch 0, total_step 1350, total loss is 42.08 , inference loss is 41.76, reg_loss is 0.32, training accuracy is 0.000000, time 322.450 samples/sec
epoch 0, total_step 1400, total loss is 42.24 , inference loss is 41.92, reg_loss is 0.32, training accuracy is 0.000000, time 319.986 samples/sec
epoch 0, total_step 1450, total loss is 42.51 , inference loss is 42.19, reg_loss is 0.32, training accuracy is 0.000000, time 322.289 samples/sec
epoch 0, total_step 1500, total loss is 42.49 , inference loss is 42.17, reg_loss is 0.32, training accuracy is 0.000000, time 318.947 samples/sec
epoch 0, total_step 1550, total loss is 42.36 , inference loss is 42.04, reg_loss is 0.32, training accuracy is 0.000000, time 314.517 samples/sec
epoch 0, total_step 1600, total loss is 42.45 , inference loss is 42.14, reg_loss is 0.31, training accuracy is 0.000000, time 324.133 samples/sec
epoch 0, total_step 1650, total loss is 42.35 , inference loss is 42.03, reg_loss is 0.31, training accuracy is 0.000000, time 322.223 samples/sec
epoch 0, total_step 1700, total loss is 42.44 , inference loss is 42.12, reg_loss is 0.31, training accuracy is 0.000000, time 323.349 samples/sec

thresholds max: 0.71 <=> min: 0.49
total time 13.695s to evaluate 12000 images of lfw
Accuracy: 0.716+-0.016
Validation rate: 0.03367+-0.01538 @ FAR=0.00100
fpr and tpr: 0.725 0.875
Area Under Curve (AUC): 0.799
Equal Error Rate (EER): 0.275

epoch and training accuracy are always 0

I have a strange problem when I run python train_nets.py --class_number=85742 ,the epoch and training accuracy are always 0,only the reg_loss is decreasing,but accuracy on lfw is incresing.Is there someone tell me how to solve it?Thank you very much! The log in terminal as fllows:

`epoch 0, total_step 22050, total loss is 48.44 , inference loss is 48.34, reg_loss is 0.09, training accuracy is 0.000000, time 243.119 samples/sec
epoch 0, total_step 22100, total loss is 47.52 , inference loss is 47.42, reg_loss is 0.09, training accuracy is 0.000000, time 226.503 samples/sec
epoch 0, total_step 22150, total loss is 47.11 , inference loss is 47.01, reg_loss is 0.09, training accuracy is 0.000000, time 219.780 samples/sec
epoch 0, total_step 22200, total loss is 47.53 , inference loss is 47.43, reg_loss is 0.09, training accuracy is 0.000000, time 247.627 samples/sec
epoch 0, total_step 22250, total loss is 51.30 , inference loss is 51.21, reg_loss is 0.09, training accuracy is 0.000000, time 244.979 samples/sec
epoch 0, total_step 22300, total loss is 48.17 , inference loss is 48.08, reg_loss is 0.09, training accuracy is 0.000000, time 241.888 samples/sec
epoch 0, total_step 22350, total loss is 45.93 , inference loss is 45.84, reg_loss is 0.09, training accuracy is 0.000000, time 231.988 samples/sec
epoch 0, total_step 22400, total loss is 45.68 , inference loss is 45.59, reg_loss is 0.09, training accuracy is 0.000000, time 225.902 samples/sec
epoch 0, total_step 22450, total loss is 44.99 , inference loss is 44.90, reg_loss is 0.09, training accuracy is 0.000000, time 250.815 samples/sec
epoch 0, total_step 22500, total loss is 45.48 , inference loss is 45.39, reg_loss is 0.09, training accuracy is 0.000000, time 251.646 samples/sec
epoch 0, total_step 22550, total loss is 49.16 , inference loss is 49.08, reg_loss is 0.09, training accuracy is 0.000000, time 227.408 samples/sec
epoch 0, total_step 22600, total loss is 46.27 , inference loss is 46.19, reg_loss is 0.09, training accuracy is 0.000000, time 227.355 samples/sec
epoch 0, total_step 22650, total loss is 47.00 , inference loss is 46.91, reg_loss is 0.09, training accuracy is 0.000000, time 242.682 samples/sec
epoch 0, total_step 22700, total loss is 49.39 , inference loss is 49.30, reg_loss is 0.09, training accuracy is 0.000000, time 253.599 samples/sec
epoch 0, total_step 22750, total loss is 48.45 , inference loss is 48.37, reg_loss is 0.09, training accuracy is 0.000000, time 229.287 samples/sec
epoch 0, total_step 22800, total loss is 47.40 , inference loss is 47.31, reg_loss is 0.09, training accuracy is 0.000000, time 241.944 samples/sec
epoch 0, total_step 22850, total loss is 44.57 , inference loss is 44.48, reg_loss is 0.09, training accuracy is 0.000000, time 244.938 samples/sec
epoch 0, total_step 22900, total loss is 47.02 , inference loss is 46.93, reg_loss is 0.09, training accuracy is 0.000000, time 244.988 samples/sec
epoch 0, total_step 22950, total loss is 46.18 , inference loss is 46.10, reg_loss is 0.09, training accuracy is 0.000000, time 227.201 samples/sec
epoch 0, total_step 23000, total loss is 48.87 , inference loss is 48.78, reg_loss is 0.09, training accuracy is 0.000000, time 211.159 samples/sec
epoch 0, total_step 23050, total loss is 44.74 , inference loss is 44.65, reg_loss is 0.09, training accuracy is 0.000000, time 235.120 samples/sec
epoch 0, total_step 23100, total loss is 47.30 , inference loss is 47.22, reg_loss is 0.09, training accuracy is 0.000000, time 239.130 samples/sec
epoch 0, total_step 23150, total loss is 46.05 , inference loss is 45.96, reg_loss is 0.08, training accuracy is 0.000000, time 238.544 samples/sec
epoch 0, total_step 23200, total loss is 46.19 , inference loss is 46.10, reg_loss is 0.08, training accuracy is 0.000000, time 214.053 samples/sec
epoch 0, total_step 23250, total loss is 44.94 , inference loss is 44.86, reg_loss is 0.08, training accuracy is 0.000000, time 256.440 samples/sec
epoch 0, total_step 23300, total loss is 49.61 , inference loss is 49.53, reg_loss is 0.08, training accuracy is 0.000000, time 233.729 samples/sec
epoch 0, total_step 23350, total loss is 49.41 , inference loss is 49.32, reg_loss is 0.08, training accuracy is 0.000000, time 248.436 samples/sec
epoch 0, total_step 23400, total loss is 45.00 , inference loss is 44.92, reg_loss is 0.08, training accuracy is 0.000000, time 253.218 samples/sec
epoch 0, total_step 23450, total loss is 47.28 , inference loss is 47.19, reg_loss is 0.08, training accuracy is 0.000000, time 227.988 samples/sec
epoch 0, total_step 23500, total loss is 45.08 , inference loss is 45.00, reg_loss is 0.08, training accuracy is 0.000000, time 238.994 samples/sec
epoch 0, total_step 23550, total loss is 47.95 , inference loss is 47.87, reg_loss is 0.08, training accuracy is 0.000000, time 220.209 samples/sec
epoch 0, total_step 23600, total loss is 46.57 , inference loss is 46.49, reg_loss is 0.08, training accuracy is 0.000000, time 251.595 samples/sec
epoch 0, total_step 23650, total loss is 47.15 , inference loss is 47.07, reg_loss is 0.08, training accuracy is 0.000000, time 246.071 samples/sec
epoch 0, total_step 23700, total loss is 45.79 , inference loss is 45.71, reg_loss is 0.08, training accuracy is 0.000000, time 252.965 samples/sec
epoch 0, total_step 23750, total loss is 46.13 , inference loss is 46.05, reg_loss is 0.08, training accuracy is 0.000000, time 223.322 samples/sec
epoch 0, total_step 23800, total loss is 46.05 , inference loss is 45.97, reg_loss is 0.08, training accuracy is 0.000000, time 256.571 samples/sec
epoch 0, total_step 23850, total loss is 44.58 , inference loss is 44.50, reg_loss is 0.08, training accuracy is 0.000000, time 227.609 samples/sec
epoch 0, total_step 23900, total loss is 45.56 , inference loss is 45.47, reg_loss is 0.08, training accuracy is 0.000000, time 241.311 samples/sec
epoch 0, total_step 23950, total loss is 46.53 , inference loss is 46.45, reg_loss is 0.08, training accuracy is 0.000000, time 230.220 samples/sec
epoch 0, total_step 24000, total loss is 46.23 , inference loss is 46.15, reg_loss is 0.08, training accuracy is 0.000000, time 238.740 samples/sec

Iteration 24000 testing...
thresholds max: 0.05 <=> min: 0.01
total time 13.504s to evaluate 12000 images of lfw
Accuracy: 0.711+-0.014
Validation rate: 0.02667+-0.01135 @ FAR=0.00100
fpr and tpr: 0.975 0.989
Area Under Curve (AUC): 0.785
Equal Error Rate (EER): 0.291

epoch 0, total_step 24050, total loss is 47.09 , inference loss is 47.01, reg_loss is 0.08, training accuracy is 0.000000, time 229.386 samples/sec
epoch 0, total_step 24100, total loss is 48.51 , inference loss is 48.43, reg_loss is 0.08, training accuracy is 0.000000, time 244.295 samples/sec
epoch 0, total_step 24150, total loss is 45.82 , inference loss is 45.75, reg_loss is 0.08, training accuracy is 0.000000, time 244.469 samples/sec
epoch 0, total_step 24200, total loss is 46.87 , inference loss is 46.79, reg_loss is 0.08, training accuracy is 0.000000, time 219.603 samples/sec
epoch 0, total_step 24250, total loss is 47.36 , inference loss is 47.29, reg_loss is 0.08, training accuracy is 0.000000, time 257.455 samples/sec
epoch 0, total_step 24300, total loss is 46.20 , inference loss is 46.12, reg_loss is 0.08, training accuracy is 0.000000, time 250.922 samples/sec
epoch 0, total_step 24350, total loss is 45.01 , inference loss is 44.93, reg_loss is 0.08, training accuracy is 0.000000, time 252.173 samples/sec
epoch 0, total_step 24400, total loss is 45.65 , inference loss is 45.57, reg_loss is 0.08, training accuracy is 0.000000, time 247.256 samples/sec
epoch 0, total_step 24450, total loss is 45.98 , inference loss is 45.90, reg_loss is 0.08, training accuracy is 0.000000, time 233.881 samples/sec
epoch 0, total_step 24500, total loss is 45.13 , inference loss is 45.05, reg_loss is 0.08, training accuracy is 0.000000, time 240.512 samples/sec
epoch 0, total_step 24550, total loss is 45.88 , inference loss is 45.80, reg_loss is 0.08, training accuracy is 0.000000, time 233.891 samples/sec
epoch 0, total_step 24600, total loss is 47.03 , inference loss is 46.96, reg_loss is 0.08, training accuracy is 0.000000, time 225.433 samples/sec
epoch 0, total_step 24650, total loss is 45.02 , inference loss is 44.94, reg_loss is 0.08, training accuracy is 0.000000, time 247.749 samples/sec
epoch 0, total_step 24700, total loss is 46.08 , inference loss is 46.00, reg_loss is 0.08, training accuracy is 0.000000, time 226.536 samples/sec
epoch 0, total_step 24750, total loss is 47.53 , inference loss is 47.45, reg_loss is 0.08, training accuracy is 0.000000, time 233.217 samples/sec
epoch 0, total_step 24800, total loss is 46.37 , inference loss is 46.29, reg_loss is 0.08, training accuracy is 0.000000, time 227.516 samples/sec
epoch 0, total_step 24850, total loss is 47.47 , inference loss is 47.40, reg_loss is 0.08, training accuracy is 0.000000, time 224.324 samples/sec
epoch 0, total_step 24900, total loss is 43.75 , inference loss is 43.67, reg_loss is 0.08, training accuracy is 0.000000, time 235.426 samples/sec
epoch 0, total_step 24950, total loss is 46.13 , inference loss is 46.05, reg_loss is 0.08, training accuracy is 0.000000, time 230.768 samples/sec
epoch 0, total_step 25000, total loss is 45.00 , inference loss is 44.92, reg_loss is 0.08, training accuracy is 0.000000, time 227.717 samples/sec
epoch 0, total_step 25050, total loss is 47.16 , inference loss is 47.08, reg_loss is 0.08, training accuracy is 0.000000, time 233.962 samples/sec
epoch 0, total_step 25100, total loss is 44.87 , inference loss is 44.79, reg_loss is 0.08, training accuracy is 0.000000, time 234.183 samples/sec
epoch 0, total_step 25150, total loss is 46.57 , inference loss is 46.50, reg_loss is 0.08, training accuracy is 0.000000, time 234.577 samples/sec
epoch 0, total_step 25200, total loss is 45.61 , inference loss is 45.53, reg_loss is 0.07, training accuracy is 0.000000, time 237.438 samples/sec
epoch 0, total_step 25250, total loss is 45.36 , inference loss is 45.29, reg_loss is 0.07, training accuracy is 0.000000, time 226.352 samples/sec
epoch 0, total_step 25300, total loss is 48.13 , inference loss is 48.05, reg_loss is 0.07, training accuracy is 0.000000, time 248.282 samples/sec
epoch 0, total_step 25350, total loss is 44.71 , inference loss is 44.63, reg_loss is 0.07, training accuracy is 0.000000, time 232.164 samples/sec
epoch 0, total_step 25400, total loss is 47.49 , inference loss is 47.42, reg_loss is 0.07, training accuracy is 0.000000, time 242.975 samples/sec
epoch 0, total_step 25450, total loss is 43.46 , inference loss is 43.39, reg_loss is 0.07, training accuracy is 0.000000, time 254.416 samples/sec
epoch 0, total_step 25500, total loss is 47.37 , inference loss is 47.30, reg_loss is 0.07, training accuracy is 0.000000, time 231.277 samples/sec
epoch 0, total_step 25550, total loss is 49.20 , inference loss is 49.13, reg_loss is 0.07, training accuracy is 0.000000, time 230.822 samples/sec
epoch 0, total_step 25600, total loss is 45.28 , inference loss is 45.21, reg_loss is 0.07, training accuracy is 0.000000, time 234.142 samples/sec
epoch 0, total_step 25650, total loss is 43.09 , inference loss is 43.02, reg_loss is 0.07, training accuracy is 0.000000, time 217.183 samples/sec
epoch 0, total_step 25700, total loss is 45.45 , inference loss is 45.38, reg_loss is 0.07, training accuracy is 0.000000, time 233.817 samples/sec
epoch 0, total_step 25750, total loss is 43.53 , inference loss is 43.46, reg_loss is 0.07, training accuracy is 0.000000, time 218.992 samples/sec
epoch 0, total_step 25800, total loss is 44.83 , inference loss is 44.76, reg_loss is 0.07, training accuracy is 0.000000, time 255.731 samples/sec
epoch 0, total_step 25850, total loss is 44.84 , inference loss is 44.77, reg_loss is 0.07, training accuracy is 0.000000, time 224.062 samples/sec
epoch 0, total_step 25900, total loss is 45.74 , inference loss is 45.67, reg_loss is 0.07, training accuracy is 0.000000, time 250.024 samples/sec
epoch 0, total_step 25950, total loss is 46.41 , inference loss is 46.34, reg_loss is 0.07, training accuracy is 0.000000, time 229.824 samples/sec
epoch 0, total_step 26000, total loss is 46.05 , inference loss is 45.97, reg_loss is 0.07, training accuracy is 0.000000, time 231.275 samples/sec

Iteration 26000 testing...
thresholds max: 0.06 <=> min: 0.01
total time 14.038s to evaluate 12000 images of lfw
Accuracy: 0.719+-0.022
Validation rate: 0.02467+-0.01447 @ FAR=0.00100
fpr and tpr: 0.970 0.987
Area Under Curve (AUC): 0.800
Equal Error Rate (EER): 0.278
`

Embeddings size 512

First all, thanks for the great work and for sharing it!

I have a problem, when I run:
python train_nets.py --embedding_size=512 --tfrecords_file_path="$DATASET/tfrecords"

Iteration 2000 testing...
Traceback (most recent call last):
File "train_nets.py", line 249, in
emb_array[start_index:end_index, :] = sess.run(embeddings, feed_dict=feed_dict)
ValueError: could not broadcast input array from shape (100,128) into shape (100,512)

Any idea how to fix it?

issue when run "test_nets.py"

When running 'test_nets.py' with pretrained model "MobileFaceNet_9925_9680.pb", met below issue,

MobileFaceNet_TF$ python test_nets.py --model=./arch/pretrained_model/MobileFaceNet_9925_9680.pb
begin db lfw convert.
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
loading bin 12000
(12000, 112, 112, 3)
Model filename: ./arch/pretrained_model/MobileFaceNet_9925_9680.pb
Runnning forward pass on lfw images

Traceback (most recent call last):
File "test_nets.py", line 143, in
main(parse_arguments(sys.argv[1:]))
File "test_nets.py", line 104, in main
emb_array = np.zeros((data_sets.shape[0], embedding_size))
TypeError: index returned non-int (type NoneType)

MobileFaceNet_TF$ python --version
Python 3.4.3

BTW, test_nets.py can work well with default output/ckpt checkpoint files.
MobileFaceNet_TF$ python test_nets.py
begin db lfw convert.
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
loading bin 12000
(12000, 112, 112, 3)
Model directory: ./output/ckpt
Metagraph file: MobileFaceNet_pretrain.ckpt.meta
Checkpoint file: MobileFaceNet_pretrain.ckpt
Runnning forward pass on lfw images

thresholds max: 1.47 <=> min: 1.41
total time 974.123s to evaluate 12000 images of lfw
Accuracy: 0.993+-0.005
Validation rate: 0.98800+-0.00933 @ FAR=0.00100
fpr and tpr: 0.502 0.838
Area Under Curve (AUC): 0.999
Equal Error Rate (EER): 0.007

please help look at the issue, thank you.

inference loss

想请问下作者训练的inference loss大概收敛到多少,我的loss最终总是在8-10左右

How to convert pretrained model to tflite

In the path of arch/pretrained_model, i use under shell to convert model to tflite.

tflite_convert  ^
--output_file  MobileFaceNet_9925_9680.tflite  ^
--graph_def_file  MobileFaceNet_9925_9680.pb    ^
--input_arrays  "input"  ^
--input_shapes  "1,112,112,3"  ^
--output_arrays  embeddings  ^
--output_format  TFLITE

Unfortunate, i get error of this.

λ tflite_convert  ^
More? --output_file  MobileFaceNet_9925_9680.tflite  ^
More? --graph_def_file  MobileFaceNet_9925_9680.pb    ^
More? --input_arrays  "input"  ^
More? --input_shapes  "1,112,112,3"  ^
More? --output_arrays  embeddings  ^
More? --output_format  TFLITE
2019-05-28 15:32:07.380246: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-05-28 15:32:08.157890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce MX150 major: 6 minor: 1 memoryClockRate(GHz): 1.341
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 1.62GiB
2019-05-28 15:32:08.185527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-28 15:32:08.727623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-28 15:32:08.742876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-05-28 15:32:08.753078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-05-28 15:32:08.764321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1365 MB memory) -> physical GPU (device: 0, name: GeForce MX150, pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\Scripts\tflite_convert-script.py", line 10, in <module>
    sys.exit(main())
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\lite\python\tflite_convert.py", line 442, in main
    app.run(main=run_main, argv=sys.argv[:1])
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\lite\python\tflite_convert.py", line 438, in run_main
    _convert_model(tflite_flags)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\lite\python\tflite_convert.py", line 191, in _convert_model
    output_data = converter.convert()
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\lite\python\lite.py", line 455, in convert
    **converter_kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\lite\python\convert.py", line 442, in toco_convert_impl
    input_data.SerializeToString())
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\lite\python\convert.py", line 205, in toco_convert_protos
    "TOCO failed. See console for info.\n%s\n%s\n" % (stdout, stderr))
tensorflow.lite.python.convert.ConverterError: TOCO failed. See console for info.
2019-05-28 15:32:13.074828: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-05-28 15:32:13.078767: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2019-05-28 15:32:13.079077: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant
2019-05-28 15:32:13.079438: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant
2019-05-28 15:32:13.079756: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant
2019-05-28 15:32:13.087213: E tensorflow/lite/toco/import_tensorflow.cc:2079] tensorflow::ImportGraphDef failed with status: Not found: Op type not registered 'Placeholder' in binary running on DESKTOP-TG1FKM4. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2019-05-28 15:32:13.434412: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] Before Removing unused ops: 2747 operators, 4533 arrays (0 quantized)
2019-05-28 15:32:13.816499: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] After Removing unused ops pass 1: 1810 operators, 3076 arrays (0 quantized)
2019-05-28 15:32:14.122395: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] Before general graph transformations: 1810 operators, 3076 arrays (0 quantized)
2019-05-28 15:32:14.124627: F tensorflow/lite/toco/graph_transformations/resolve_tensorflow_switch.cc:98] Check failed: other_op->type == OperatorType::kMerge Found BatchNormalization as non-selected output from Switch, but only Merge supported.

questions about the inference loss

i check the inference loss on tensorboard and find that it increase sharply after every epoch. can you tell me the reason. i think it only occur when the data doesn't shuffle fully。

Feature embeddigs - Euclidean Space

Hi,
I'm doing 1:1 face verification. I tried to use this model but getting different cosine distances for face_pairs. I'm really confused.
Does the embeddings that are calculated by mobilefacenet are in the Euclidean space where distances
directly correspond to a measure of face similarity?
If not, how to calculate the similarity between two faces? As, I'm not planning to uses any clustering algorithm.

Facial Detection part

Hey, nice work on this mobilefacenet implementation. My question is if I want to implement this with the detection method, what do you suggest for the facial detection method. I was thinking maybe MTCNN? Any suggestions?

1:1 face matching

What is the threshold if i want to do 1:1 face matching in mobile phone with .pb model ?

i had tried set threshold to 1.234 ,but it seem do not work well.

verification problem

在lfw上验证准确度时为什么要embeddings=embeddings_list[0] + embeddings_list[1]。我用读图片的方式进行验证准确度并不高 ,谢谢 @sirius-ai

question about the test dataset

the result on agedb_30 is not very good, especially the val result. it makes me confused. dose the val means the validate accuracy? it is really small.

about arcface_loss

Nice work! but you use cos_loss instead of arcface_loss.When i use the arcface_loss, the loss will be nan,and the accuracy will drop.Have you ever been in the same situation

> just keep running

just keep running

I keep running for 11 epochs and accuracy in validation set rise to 99.5%,but training accuracy is still always 0. Do you know why? Thank you very much!

Originally posted by @jolinlinlin in #39 (comment)

How to change image size to 160x160

want to training mobilefacenet with image size 160x160. I have modified mobilefacenet structure, but the data size is 112x112.
How to change size of dataset in Dataset Zoo?
Do you support resize function in the train script?
Thanks~

作者您好,我的total_loss一直在15~40之间跳,reg_loss倒是一直降低,但降低到0.04的时候就停止了

作者您好,total_loss是不是没有参考价值,我的跑了4个epoch,在tensorboard中观察,total_loss总是从15慢慢上升到40,又慢慢降低到15,然后又上升到40,反复这样。请问这样是正常的吗?另外能否贴一下您的训练log,我想看一下您的total_loss。

这个跟我之前跑pytorch的情况不太一样,同一份数据集,学习率和优化器都是一样的0.1的起始值,adam的优化器。pytroch的arcface损失值,是从30多一直降低到0.几。

注:我的训练数据集是CASIA,里面是10575个分类,393698张照片,都是112x112已经做了人脸对齐。

Adapt for 224x224 input

How can I adapt your MobileFaceNet model to process standart Mobilenet image size?
You have defined a variable mobilenet_v2.default_image_size = 112, but it is not used anywhere I think. Thanks

pre_train model

hey,can you upload your best performance pre_train model?my pc has some problems while runing the training file.

Out of memory when training with a P-100, 12G

sorry but I met with the OOM problem when training with 128 batch_size and 12G P-100.
is that normal?
i wonder why this happened since my card is 12G memory and i just want use larger batch_size.
thanks.

how to change the loss to arcface_loss ,if i want to use the softmax-trained-model as the pretrained model?

i change the loss to arcface_loss and i want to use the model train by default loss as the pretrained model,but it occur an error.

Traceback (most recent call last):
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key arcface_loss/embedding_weights not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_309 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_nets5.py", line 199, in
saver.restore(sess, ckpt.model_checkpoint_path)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1755, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key arcface_loss/embedding_weights not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_309 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Caused by op 'save/RestoreV2', defined at:
File "train_nets5.py", line 188, in
saver = tf.train.Saver(tf.trainable_variables(), max_to_keep=args.saver_maxkeep)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1293, in init
self.build()
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1302, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1339, in _build
build_save=build_save, build_restore=build_restore)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 796, in _build_internal
restore_sequentially, reshape)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 449, in _AddRestoreOps
restore_sequentially)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 847, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1030, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/opt/app/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key arcface_loss/embedding_weights not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_309 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

the embedding_weights is a value in arcface_loss, apprarently,the model trained before doesn't has this variable. what can i do to solve it.

training time info

Hi @sirius-ai ,

Nice work and the final performance!
Would you please share the training cost here? e.g. GPU detail, training hours.
Since i have only 1080ti and would like to estimate the training time before working, you know not everyone has the computation power as google.

Thanks very much!

How to compare distance between two features?

Hi,
How to compare distance/similarity between two faces' features.
I tries below methods but failed.
diff = np.subtract(f2, f1)
dist = np.sum(np.square(diff), 1)
dist = np.linalg.norm(diff, axis=1)
dist = sklearn.metrics.pairwise_distances(f1,f2, metric='cosine')

Thanks for the help!!

Does not generalise to other face alignments

The MS1M dataset provided by Insight face has been aligned using MTCNN. The mobilefacenet pre-trained model fails to work on faces aligned using dlib. Does anyone else face this issue?

MobileFaceNet Struct

thanks for your good repo. maybe there has some miswake in network define
1、
in https://github.com/sirius-ai/MobileFaceNet_TF/blob/master/nets/MobileFaceNet.py#L52-L63
should be

_CONV_DEFS = [
    Conv(kernel=[3, 3], stride=2, depth=64, ratio=1),
    DepthwiseConv(kernel=[3, 3], stride=1, depth=64, ratio=1),
    InvResBlock(kernel=[3, 3], stride=2, depth=64, ratio=2, repeate=1), (first stride set to 2InvResBlock(kernel=[3, 3], stride=1, depth=64, ratio=2, repeate=4),   (set stride to 1)
    InvResBlock(kernel=[3, 3], stride=2, depth=128, ratio=4, repeate=1),
    InvResBlock(kernel=[3, 3], stride=1, depth=128, ratio=2, repeate=6),
    InvResBlock(kernel=[3, 3], stride=2, depth=128, ratio=4, repeate=1),
    InvResBlock(kernel=[3, 3], stride=1, depth=128, ratio=2, repeate=2),
    Conv(kernel=[1, 1], stride=1, depth=512, ratio=1),
]

2、 first depthwise conv maybe not conv_em in https://github.com/sirius-ai/MobileFaceNet_TF/blob/master/nets/MobileFaceNet.py#L138

Using ArcFace loss with facenet model

I am trying to implement arcface loss for facenet training.
I replaced cross_entropy_mean with arcface_mean which I got from arc face function.
The loss is around 40 and accuracy is 0.00 and not changing. The loss is increasing.
Any suggestion what I might be doing wrong?

loss NAN

我用softmax训练17万次后(lr=0.1, batch 90)再用arcface训练[batch 90]几万步就会出现loss 无效的问题,请问这个要怎么解决?大神有完整的训练过吗,从softmax 到arcface 参数设置?

No meta file found in the model directory (./output/ckpt)

There are MobileFaceNet_9925_9680.pb in the pretrained_model. The error "No meta file found in the model directory (./output/ckpt)" is appear when I run "test_nets.py".
test_nets.py says 'Could be either a directory containing the meta_file and ckpt_file or a model protobuf (.pb) file', but I couldn't import .pb file during test_nets.py.
How to modify this file?
Thanks

how to test and evaluate it ?

How to get the tpr, fpr, accuracy, val, val_std, far indicator results.
In the test_nets.py,I found code under. however the parameter of issame_list is the list of boolean, I don' t understand that?

tpr, fpr, accuracy, val, val_std, far = evaluate(emb_array, issame_list, nrof_folds=args.eval_nrof_folds)```

MemoryError

hi,I have some problems,dont know why.Can you tell me?THX.

Use the retry module or similar alternatives.
begin db lfw convert.
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
loading bin 12000
(12000, 112, 112, 3)
begin db cfp_ff convert.
Traceback (most recent call last):
File "train_nets.py", line 107, in
data_set = load_data(db, args.image_size, args)
File "/home/ubuntu/wyq/MobileFaceNet_TF/utils/data_process.py", line 111, in load_data
datasets = np.empty((len(issame_list)*2, image_size[0], image_size[1], 3))
MemoryError

doubt about application effects

i trained the model for a long time .i reach the performance of the author.but when i use it for comparison between the people and Certificates,it perform poorly and facenet has a much better result. do you have any advice or should i try another model?

Training classification accuracy?

What is your classification accuracy (approximate value) towards the end of training? and what is the general range to be expected for classification scores?
I am using a variant of mobilenet and my classification accuracy is abut 0.54 with around 90K classes(unique faces).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.