xialuxi / arcface-caffe Goto Github PK

View Code? Open in Web Editor NEW

279.0 23.0 125.0 8.92 MB

insightface-caffe

License: MIT License

C++ 45.42% Python 25.00% Cuda 29.58%

face-recognition face-detection face-pose face-landmark

arcface-caffe's Issues

UMDFaces Dataset

Hi, thanks for your project, can you share the UMDFaces Dataset with me.

关于cosin_add_m_layer的实现

感谢老师分享
关于cosin_add_m_layer的实现我有个疑问，Forward_cpu()中的以下代码是什么意思？arcface论文里好像只提到了下面else的实现，判断cos_t[i * dim + gt] <= threshold的意图以及对应的处理希望老师给解答一下，谢谢！

if(cos_t[i * dim + gt] <= threshold)

{

    top_data[i * dim + gt] = cos_t[i * dim + gt] - sin(M_PI - m_) * m_;

    tpflag[i * dim + gt] = 1.0f;

}

else

    top_data[i * dim + gt] = cos_t[i * dim + gt] * cos_m - sin_theta * sin_m;

caffe 训练速度仍然很慢，请问作者后来有发现这个问题吗，

@xialuxi ,谢谢您的回复，我上午看了adaface的论文的实现
用caffe 训练很慢的原因请问作者后来有发现吗，前向传播我觉得挺快的
我设置的参数是batch size=56 l两块1080 iter_size:6
/57597324-8428ad00-7581-11e9-9fa1-e2d72d0446f7.png)
12:49:24.228211 23509 solver.cpp:243] Iteration 0, loss = 24.1503
I0513 12:49:24.228235 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.839286
I0513 12:49:24.228257 23509 solver.cpp:259] Train net output #1: softmax_loss = 22.5511 (* 1 = 22.5511 loss)
I0513 12:49:24.228299 23509 sgd_solver.cpp:138] Iteration 0, lr = 0.01
I0513 12:54:50.852994 23509 solver.cpp:243] Iteration 100, loss = 20.3324
I0513 12:54:50.853057 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.928571
I0513 12:54:50.853081 23509 solver.cpp:259] Train net output #1: softmax_loss = 17.7896 (* 1 = 17.7896 loss)
I0513 12:54:50.923504 23509 sgd_solver.cpp:138] Iteration 100, lr = 0.01
I0513 13:00:39.458894 23509 solver.cpp:243] Iteration 200, loss = 18.8438
I0513 13:00:39.459019 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.964286
I0513 13:00:39.459044 23509 solver.cpp:259] Train net output #1: softmax_loss = 16.4004 (* 1 = 16.4004 loss)
I0513 13:00:39.500185 23509 sgd_solver.cpp:138] Iteration 200, lr = 0.01
I0513 13:06:26.364652 23509 solver.cpp:243] Iteration 300, loss = 18.461
I0513 13:06:26.364759 23509 solver.cpp:259] Train net output #0: accuracy-t = 0.946429
I0513 13:06:26.364783 23509 solver.cpp:259] Train net output #1: softmax_loss = 16.8125 (* 1 = 16.8125 loss)

CosinAddmBackward

Hi, I compared the cpp and cu code, I found a bug for calculated the diff in CosinAddmBackward function, it need multiply bottom_diff[index * dim + gt] when calculated the bottom_diff, it should be used the following code.
bottom_diff[index * dim + gt] =bottom_diff[index * dim + gt] *(cos(bais) + sin(bais) * cos_theta / sin_theta);

arcface-caffe/cosin_add_m_layer.cu

Line 37 in 1b0aa15

bottom_diff[index * dim + gt] = cos(bais) + sin(bais) * cos_theta / sin_theta;

SV-X-Softmax的.cu文件

非常感谢您的复现，请问SV-X-Softmax有.cu文件文件吗，想试试复现效果

Did anyone try AdaCos?

I will in a couple of days and compare it with ArcFace using Megaface and other tests, will present the results. But I'm a bit confused about M parameter.

有没有与arcface-caffe/mxnet_to_caffe/face.prototxt 模型匹配的训练好的model文件？？

您好，我在https://github.com/gehaocool/CombinedMargin-caffe这个项目中找到了训练好的model，但是这个的输入是112X96的图像，而您的模型输入是112X112的？？是不是不能直接应用啊

UMDFaces Dataset数据集下载

UMDFaces Dataset数据集没提供下载了，你有没网盘的数据让我们下载训一下landmark-pose模型

Compile on ubuntu

Hi, I compiled your repository as your described, I did as follow

1.) I downloaded the repository ) https://github.com/xialuxi/AMSoftmax project
2.) ın the caffe windows directory I changed the make.config as described caffe installation
2.1) cd caffe-windows
2.2) for req in $(cat python/requirements.txt); do pip install --trusted-host pypi.python.org $req; done
2.3) cp Makefile.config.example Makefile.config
2.4)gedit Makefile.config
USE_CUDNN := 1
OPENCV_VERSION := 3
PYTHON_INCLUDE := /usr/include/python2.7
/usr/local/lib/python2.7/dist-packages/numpy/core/include
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
3 ) Copy cosin_add_m_layer.hpp to the directory: ./caffe/include/caffe/layers/
4) Copy cosin_add_m_layer.cpp and cosin_add_m_layer.cu to the directory: ./caffe/src/caffe/layers/
5) According to the proto file, modify the ./caffe/src/caffe/proto/caffe.proto file accordingly.
6) Also I copy combined_margin_layer.cpp,
combined_margin_layer.cu and combined_margin_layer.hpp in the https://github.com/gehaocool/CombinedMargin-caffe as descrbed in the step 3 and 4 places
6) make -j8
7.) make py
8.) make test -j8
But after these step when I run the following command
make runtest -j8
This failed for some layer testing. Therefore I did not run your repository. I miss something can you correct me in the compilation steps.

Thank you for your time..

以前和现在的 cosin_add_m scale 的实现区别是什么昵

以前的层，是两个层，一个添加m角度，一个添加尺度64或者128
然后新的arcface合并只有一层，没有scale 参数设置的值，但是这里和上面的实现的区别在哪里昵
layer {
name: "cosin_add_m"
type: "CosinAddm"
bottom: "temp_fc6"
bottom: "label"
top: "fc6_margin"
cosin_add_m_param {
m: 0.5
}
}
layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 64
}
}

修改过后的addm 层
layer {
name: "adacos_add_m_scale"
type: "AdaCosAddmScale"
bottom: "fc6"
bottom: "label"
top: "fc6_margin_scale"
adacos_add_m_scale_param {
m: 0.5
num_classes: 10575
}
}

@xialuxi

训练数据，

您好，作者，请问您的lmdb 数据，是用剪切对齐后的图片和label 制作的吗，label 是什么内容？？？，是每个人一个文件夹，有多张图片，最后分成多少类，就是多少个人的图片文件夹吗，数据如何制作？？

softmax计算

hi xialuxi：
你好！
arcface loss的softmax计算和普通的softmax计算不一样：

分母部分把 yi 和 j 分开了，请问代码中这部分计算在哪里写呢？求导部分在哪里写呢？
非常感谢！

doesnot work in the two class classification model training

hello guys,
Thanks the author for his excellent work firstly.
I use this arcface loss to finetune a classification model with two class,but I donot kown why it does not work.The loss does not decrease and accuracy is jumping.the parameter is m = 0.5, s=64.I have try some other parameters,but it is always same.
Has anyone encountered this similar problem? thanks.

AdaCos的问题

你的实现中theta_med是计算batch内所有样本在所有类别上的角度均值，论文中说是“the median of all corresponding classes’ angles”，我理解的是类似每个样本在标签类上的夹角，不知道对不对？

怎么使用这个repo呢？有没有guide

Optimizer

Hi,
what kind of optimiser (Adam, SGD) did you you for the test in ex. CASIA-WebFace?

I'm asking because look like gradient in final layer using adacos are lower than using fixed s=20 (98 classes). Also, adacos get lower scores. I'm just thinking, that maybe to low gradient are provided for learning the model.
I'm using Adam and the results are following (dataset is CARS196 )

Adam fixed: 0.79
Adam AdaCos: 0.745
Adam AdaCos x2 bigger lr: 0.755

In general AdaCos works worse for some reason, not sure why. Maybe it is also because that averaged angle for non-similar classes in smaller than in case of faces.
Or we need more adaptive LR method for this problem.

请问一下训练好的model有吗

在哪里可以下载到测试的model

accuracy_hat_arc不超过0.6，损失处于1.5

您好，我使用arcface训练四分类任务，最后的训练结果：
Iteration 285400(1.51479 iter/s,132.031s/200 iters),loss=1.58242
Train net output #0:accuracy_hat=1
Train net output #1:accuracy_hat_arc=0.4375
Train net output #2:loss_hat=1.58242(*1 = 1.58242 loss)

accuracy_hat_arc的精确度始终不能高于0.6，
loss_hat损失始终处于1.5多，不收敛

请问有谁遇到这个问题，或有解决思路？江湖救急，谢谢了

对于多个关键点的训练（68,194），有人试过吗？

效果怎么样？

landmark 检测不准

你好，我尝试了两个模型的关键点检测，出来的效果都不准。后来尝试加了人脸框检测，并未有所改善，请问是什么地方我忽略了吗？

损失函数不下降？

我训练网络的batchsize为128，学习率从0.001开始降到0.00001，迭代了20000代，train loss在3~4之间一直动荡，降不下去。初始学习率改了也是这样。想问问你是怎么设置训练参数的？

关于loss的梯度计算

作者你好，有一个关于loss的梯度计算的问题想请假一下：
1、arcface的梯度我看到代码是：cos_m + sin_m * cos_t[i * dim + gt] / sin_theta，其实就是sin(theta+m)/sin(theta)，我自己的计算是：-sin(theta+m)，是不是少了什么呢？
2、combined margin的梯度我看你的代码是：m1 * pow(1 - pow(bottom_data[i * dim + gt], 2), -0.5) * sin(m1_x_m2[i * dim + gt])，其实就是m1 * sin(m1 * theta+m2) * sin(theta)，我自己的计算是：-m1 * sin(m1 * theta+m2)，请问你的计算是怎么得到的呢？万分感谢！

cosin_add_m_layer.cpp的疑问

看起来没发现loss的计算部分，损失函数层并没有出现在这些文件里？

About the cos_t in cosin_add_m_layer.cpp,and cos_theta > 1

Hi,
The cos_t is WX from bottom_data,right?
WX = ||W||*||X||cos(theta),
that means,cos_t = ||W||||X||*cos(theta), cos_theta >1 frequently,
Does it work despite of clip for cos_t?

Thank you!

编译Combined Margin Loss 出错？？？

您好，我刚刚下载了您的最新的Combined Margin Loss文件，但是在编译的时候报错了，您看是什么问题呢？

Severity Code Description Project File Line Suppression State
Error C2065 'arccos_x': undeclared identifier libcaffe E:\AMSoftmax-master\Caffe-AM-Softmax\caffe-windows\src\caffe\layers\combined_margin_layer.cpp 32
Severity Code Description Project File Line Suppression State
Error C2228 left of '.mutable_cpu_data' must have class/struct/union libcaffe E:\AMSoftmax-master\Caffe-AM-Softmax\caffe-windows\src\caffe\layers\combined_margin_layer.cpp 32

谢谢您！！

关键点检测网络问题

关键点检测的网络用的是可分离卷积，权重大小应该是filter_sizexfliter_sizexoutput_channel，
但实际上权重大小却是为input_channelxfilter_sizexfliter_sizexoutput_channel，类似普通的卷积。
我把这些可分离卷积改成了正常卷积，测试程序可运行，但结果不对，麻烦确认下。

Focal loss

I noticed that you use Focal loss as a second loss. What's the purpose?

Could you share wingloss example

Could you share wingloss in train prototxt example like EuclideanLoss

layer {
  name: "loss"    
  type: "EuclideanLoss"
  bottom: "fc2"
  bottom: "label"
  top: "loss"
  loss_weight: 100
}

关于adacos中的m

文章中应该是去掉了m的吧，为什么这里又要加上呢

为什么loss为87.33，accuracy为1？

为什么把loss设置成arcloss之后，loss为87.33，accuracy为1？
layer {
name: "cosin_add_m"
type: "CosinAddm"
bottom: "concat_fc"
bottom: "label"
top: "fc6_margin"
cosin_add_m_param {
m: 0.5
}
}

layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 64
}
}
}

layer {
name: "concat_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
top: "concat_loss"
}

如果直接是
layer {
name: "concat_loss"
type: "SoftmaxWithLoss"
bottom: "concat_fc"
bottom: "label"
top: "concat_loss"
}
就可以收敛，搞不清了为什么了

How to get the similarity between two faces?

Hello, is there a demo that use caffemodel to get the distance and similarity between two faces just like the deploy/test.py of original insightface?Thanks,waiting for reply.

The Predictions of Landmarks is not accurate at all

landmark检测结果不对

运行landmark_and_pose路径下的testlandmark.py文件，程序可以运行，但是出来的点位置不对。

NormL2没有找到

arcface-caffe/AdaCos/adacos_add_m_scale.prototxt这个里面有一个NormL2层，好像工程里面没有这一层，这个可以在哪里找到？谢谢，

pose = pose * 90.0是出来的pose要旋转的意思吗？

landmark_and_pose/new_model/detect.py这个脚本里，pose = pose * 90.0是什么意思？是代表这个模型出来的人脸角度都需要旋转90度吗？

讨论：前向计算w做了归一化，而反向传播更新w的时候，是按照归一化前来计算梯度，是否会导致收敛困难？

您好，请问有.cu文件吗

训练过程精度较高，测试的时候精度确不高？

您好，感谢您的工作。
最近把该损失函数用到其他分类问题，我在训练的时候测试集精度能达到99%，但是测试的时候用相同的测试集精度却只有94%。
我测试的时候不是用的caffe的tools里面的test，是自己读图的方式，测试时prototxt直接取到fc2，是否这里有问题，需要做修改吗？

arcface 损失函数的添加

作者您好，
请问您的损失函数的添加，我目前只加入 cosin_add_m_layer相关proto,参数，训练的时候出现，这种情况，输出 costheta >1 ************ 1.58 ，这种输出很多，请问可能什么原因昵？
然后，caffe版本的训练和mxnet类似吗，就是也是先只训练softmax,到12万步，然后加入arcface 损失曾，再进行finetune 吗？

mnist的例子

您好，请问您有用caffe版本的arcface测试过mnist的例子吗？我的训练刚开始迭代两次就loss=87.3365了，训练失败。我完全按照您给的caffe工程添加的loss层，可是一直不能训练，您能给个mnsit的例子吗？谢谢！

训练出的模型的效果如何？

作者您好，请问训练出的模型的效果如何？和mxnet的代码相比，模型精度有损失吗？

卷积层是否使用偏置 bias_term

作者你好，我看insightface官网里面的人脸识别arcface网络结构，卷积层中都不使用偏置，而你这边全部都使用，请问这个有什么区别吗？谢谢

cosin_add_m_layer的gpu实现？？？

您好，您有没有写cosin_add_m_layer.cu 文件呢？loss层的cpu计算和gpu计算对网络训练时间影响大吗？

Is it mandatory to use Nvidia GPU, can we use OpenCL instead?

Could you provide the adacos_add_m_scale_layer.cu file ? thx

训练时loss损失一直不变，希望指点一下

模型deploy如下：
name: "ArcFace"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
resize_param {
prob: 1
resize_mode: WARP
height: 128
width: 128
interp_mode: LINEAR
interp_mode: AREA
interp_mode: CUBIC
interp_mode: LANCZOS4
}
mirror: True
crop_h: 128
crop_w: 128
#distort_param {
# brightness_prob: 0.5
# brightness_delta: 32
# contrast_prob: 0.5
# contrast_lower: 0.5
# contrast_upper: 1.5
# hue_prob: 0.5
# hue_delta: 18
# saturation_prob: 0.5
# saturation_lower: 0.5
# saturation_upper: 1.5
# random_order_prob: 0.
#}
}
data_param {
source: "/media/zz/7c333a37-0503-4f81-8103-0ef7e776f6fb/Face_Data/casia_extract_aligned_train_9204cls_lmdb"
batch_size: 512
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
resize_param {
prob: 1
resize_mode: WARP
height: 128
width: 128
interp_mode: LINEAR
}
crop_h: 128
crop_w: 128
}
data_param {
source: "/media/zz/7c333a37-0503-4f81-8103-0ef7e776f6fb/Face_Data/casia_extract_aligned_test_9204cls_lmdb"
batch_size: 2
backend: LMDB
}
}
############## CNN Architecture ###############
layer {
name: "data/bias"
type: "Bias"
bottom: "data"
top: "data/bias"
param {
lr_mult: 0
decay_mult: 0
}
bias_param {
filler {
type: "constant"
value: -128
}
}
}
################################################
layer {
name: "conv1"
type: "Convolution"
bottom: "data/bias"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 7
pad: 3
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv1_bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv1_scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv1_relu"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "pool1_1"
type: "Pooling"
bottom: "pool1"
top: "pool1_1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2_1"
type: "Convolution"
bottom: "pool1_1"
top: "conv2_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv2_1_bn"
type: "BatchNorm"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv2_1_scale"
type: "Scale"
bottom: "conv2_1"
top: "conv2_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv2_1_relu"
type: "ReLU"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv2_2"
type: "Convolution"
bottom: "conv2_1"
top: "conv2_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv2_2_bn"
type: "BatchNorm"
bottom: "conv2_2"
top: "conv2_2"
}
layer {
name: "conv2_2_scale"
type: "Scale"
bottom: "conv2_2"
top: "conv2_2"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv2_2_relu"
type: "ReLU"
bottom: "conv2_2"
top: "conv2_2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2_2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
##############################################
layer {
name: "conv3_1"
type: "Convolution"
bottom: "pool2"
top: "conv3_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv3_1_bn"
type: "BatchNorm"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "conv3_1_scale"
type: "Scale"
bottom: "conv3_1"
top: "conv3_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv3_1_relu"
type: "ReLU"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "conv3_2"
type: "Convolution"
bottom: "conv3_1"
top: "conv3_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv3_2_bn"
type: "BatchNorm"
bottom: "conv3_2"
top: "conv3_2"
}
layer {
name: "conv3_2_scale"
type: "Scale"
bottom: "conv3_2"
top: "conv3_2"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv3_2_relu"
type: "ReLU"
bottom: "conv3_2"
top: "conv3_2"
}
layer {
name: "conv4_1"
type: "Convolution"
bottom: "conv3_2"
top: "conv4_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv4_1_bn"
type: "BatchNorm"
bottom: "conv4_1"
top: "conv4_1"
}
layer {
name: "conv4_1_scale"
type: "Scale"
bottom: "conv4_1"
top: "conv4_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv4_1_relu"
type: "ReLU"
bottom: "conv4_1"
top: "conv4_1"
}
layer {
name: "conv4_2"
type: "Convolution"
bottom: "conv4_1"
top: "conv4_2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv4_2_bn"
type: "BatchNorm"
bottom: "conv4_2"
top: "conv4_2"
}
layer {
name: "conv4_2_scale"
type: "Scale"
bottom: "conv4_2"
top: "conv4_2"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv4_2_relu"
type: "ReLU"
bottom: "conv4_2"
top: "conv4_2"
}
################################################
layer {
name: "conv5_1"
type: "Convolution"
bottom: "conv4_2"
top: "conv5_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
pad: 0
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "conv5_1_bn"
type: "BatchNorm"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "conv5_1_scale"
type: "Scale"
bottom: "conv5_1"
top: "conv5_1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "conv5_1_relu"
type: "ReLU"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv5_1"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
#########################################
#########################################
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool3"
top: "fc1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1024
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "fc1_bn"
type: "BatchNorm"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc1_scale"
type: "Scale"
bottom: "fc1"
top: "fc1"
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}
layer {
name: "fc1_relu"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 128
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "fc2_norm"
type: "NormalizeJin"
bottom: "fc2"
top: "fc2_norm"
norm_jin_param {
across_spatial: true
scale_filler {
type: "constant"
value: 1.0
}
channel_shared: true
}
}
############### Arc-Softmax Loss ##############

layer {
name: "fc6_changed"
type: "InnerProduct"
bottom: "fc2_norm"
top: "fc6"
inner_product_param {
num_output: 9204
normalize: true
weight_filler {
type: "xavier"
}
bias_term: false
}
}
####################################################
layer {
name: "cosin_add_m"
type: "CosinAddm"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
cosin_add_m_param {
m: 0.1
}
include {
phase: TRAIN
}
}

layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 64
}
}
include {
phase: TRAIN
}
}

######################################################
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
#bottom: "label"
#bottom: "data"
top: "softmax_loss"
loss_weight: 1
include {
phase: TRAIN
}
}

layer {
name: "Accuracy"
type: "Accuracy"
bottom: "fc6"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}

loss损失如下：
I0627 17:38:58.567371 6757 solver.cpp:224] Iteration 450 (2.13816 iter/s, 4.67691s/10 iters), loss = 87.3365
I0627 17:38:58.567402 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
I0627 17:38:58.567409 6757 sgd_solver.cpp:137] Iteration 450, lr = 0.00314
I0627 17:39:03.256306 6757 solver.cpp:224] Iteration 460 (2.13288 iter/s, 4.6885s/10 iters), loss = 87.3365
I0627 17:39:03.256340 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
I0627 17:39:03.256347 6757 sgd_solver.cpp:137] Iteration 460, lr = 0.00314
I0627 17:39:07.941520 6757 solver.cpp:224] Iteration 470 (2.13457 iter/s, 4.68478s/10 iters), loss = 87.3365
I0627 17:39:07.941551 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
I0627 17:39:07.941558 6757 sgd_solver.cpp:137] Iteration 470, lr = 0.00314
I0627 17:39:12.623337 6757 solver.cpp:224] Iteration 480 (2.13612 iter/s, 4.68139s/10 iters), loss = 87.3365
I0627 17:39:12.623456 6757 solver.cpp:243] Train net output #0: softmax_loss = 87.3365 (* 1 = 87.3365 loss)
请问该如何修改？

InnerProduct with normalize

Hi,

It seems that you missed to upload the innerProduct layer with normalized feature?

https://github.com/xialuxi/arcface-caffe/blob/master/cosin_add_m.prototxt#L828

关于CosinAddmLayer和combined_margin_layer的疑问

hi xialuxi:
看起来CosinAddmLayer就是arcloss(对应combined margin m1=1,m3=0的情况)，那为什么CosinAddmLayer考虑cos_t[i * dim + gt] > 1.0f和cos_t[i * dim + gt] <= threshold，而combined_margin_layer不需要呢？
请问最终训练采用的哪一个呢？
非常感谢！

xialuxi / arcface-caffe Goto Github PK

arcface-caffe's Issues

Recommend Projects

Recommend Topics

Recommend Org