Comments (3)
分析部分有点问题,由于在推导过程中忽略了前向计算中使用的std::abs
和 T term1 = (x > 0) ? x : 0;
的梯度计算,所以现在修改前向计算公式如下:
经过推导得到的反向梯度计算为:
其中where(x>0, 1, 0)
是前向计算中T term1 = (x > 0) ? x : 0;
的梯度,where(x>=0,1, -1)
是std::abs
的梯度
对应的修复PR:
from paddle.
kernel反向计算的结果,向numpy中采用数值求解的方式(见源码:op_test.py#L148-L323)计算的结果对齐,而拆解算子执行梯度的方式是通过自动微分求解的,其与kernel反向计算结果对齐。推断是Kernel反向实现的计算,存在问题。验证如下:
在执行sigmoid_cross_entropy_with_logits
op的TestSigmoidCrossEntropyWithLogitsOp4
中,可以观察到相对误差容忍阈值max_relative_error=0.005
,设置得比较大,此时当前develop分支对反向kernel的实现可以通过此单测(虽然通过了,但是肉眼可见两个tensor确实有一些不同)
W0515 05:27:49.445072 36810 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.4, Runtime API Version: 11.2
W0515 05:27:49.449699 36810 gpu_resources.cc:164] device: 0, cuDNN Version: 8.1.
numeric :
[array([[ 3.77183495e-04, -2.20797240e-04, -3.14791652e-04, ...,
-2.02644761e-04, 2.45672779e-04, -6.17090210e-06],
[ 4.84435251e-04, 2.84202716e-04, 1.83931716e-05, ...,
6.29521346e-04, 5.20318281e-04, 2.33842612e-05],
[ 3.12574747e-04, -4.71084098e-04, 1.10182442e-04, ...,
6.98864401e-04, 2.33956572e-04, -7.56920161e-05],
...,
[ 7.15934865e-04, -3.74937504e-04, 3.26225586e-04, ...,
3.84216391e-05, -5.20641936e-04, -4.17575856e-04],
[ 1.96946960e-05, 3.88698082e-04, -2.81023718e-04, ...,
-5.38852117e-05, 3.67850861e-04, -1.84393860e-04],
[-9.46350590e-05, 1.44749951e-05, -2.59066396e-04, ...,
5.43415898e-04, 5.17161748e-05, 5.20940836e-04]])]
analytic_grads :
[array([[ 1.84299020e-04, -2.20797405e-04, -3.14791844e-04, ...,
-2.02644764e-04, 1.03724500e-04, -3.28235993e-04],
[ 1.30288861e-04, -4.92659295e-04, -9.86970736e-05, ...,
3.65435473e-04, 4.38155145e-04, -7.09606712e-04],
[-1.62492418e-04, -4.71084140e-04, 1.10182313e-04, ...,
1.25990168e-04, 1.87285167e-04, -5.22634377e-04],
...,
[-3.11346327e-05, -3.74937645e-04, 3.26225574e-04, ...,
-3.02651920e-04, -5.20642019e-04, -4.17576055e-04],
[-3.08952871e-04, 2.82421633e-04, -2.81023766e-04, ...,
-5.38854093e-05, 6.31943427e-05, -1.84394142e-04],
[-1.67904546e-04, -1.19940036e-05, -2.59066405e-04, ...,
3.36552688e-04, 2.25882243e-05, -9.09629301e-05]])]
max_relative_error :
0.005
.
----------------------------------------------------------------------
Ran 1 test in 2.453s
OK
但是当我把这个容忍阈值改为max_relative_error=0.0005
时,则会得到如下结果。
I0515 06:12:58.692179 17707 program_interpreter.cc:221] New Executor is Running.
I0515 06:12:58.693336 17707 interpreter_util.cc:652] Standalone Executor is Used.
numeric :
[array([[ 3.77183495e-04, -2.20797240e-04, -3.14791652e-04, ...,
-2.02644761e-04, 2.45672779e-04, -6.17090210e-06],
[ 4.84435251e-04, 2.84202716e-04, 1.83931716e-05, ...,
6.29521346e-04, 5.20318281e-04, 2.33842612e-05],
[ 3.12574747e-04, -4.71084098e-04, 1.10182442e-04, ...,
6.98864401e-04, 2.33956572e-04, -7.56920161e-05],
...,
[ 7.15934865e-04, -3.74937504e-04, 3.26225586e-04, ...,
3.84216391e-05, -5.20641936e-04, -4.17575856e-04],
[ 1.96946960e-05, 3.88698082e-04, -2.81023718e-04, ...,
-5.38852117e-05, 3.67850861e-04, -1.84393860e-04],
[-9.46350590e-05, 1.44749951e-05, -2.59066396e-04, ...,
5.43415898e-04, 5.17161748e-05, 5.20940836e-04]])]
analytic_grads :
[array([[ 1.84299020e-04, -2.20797405e-04, -3.14791844e-04, ...,
-2.02644764e-04, 1.03724500e-04, -3.28235993e-04],
[ 1.30288861e-04, -4.92659295e-04, -9.86970736e-05, ...,
3.65435473e-04, 4.38155145e-04, -7.09606712e-04],
[-1.62492418e-04, -4.71084140e-04, 1.10182313e-04, ...,
1.25990168e-04, 1.87285167e-04, -5.22634377e-04],
...,
[-3.11346327e-05, -3.74937645e-04, 3.26225574e-04, ...,
-3.02651920e-04, -5.20642019e-04, -4.17576055e-04],
[-3.08952871e-04, 2.82421633e-04, -2.81023766e-04, ...,
-5.38854093e-05, 6.31943427e-05, -1.84394142e-04],
[-1.67904546e-04, -1.19940036e-05, -2.59066405e-04, ...,
3.36552688e-04, 2.25882243e-05, -9.09629301e-05]])]
max_relative_error :
0.0005
F
======================================================================
FAIL: test_check_grad (__main__.TestSigmoidCrossEntropyWithLogitsOp4)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/paddle/test/deprecated/legacy_test/test_sigmoid_cross_entropy_with_logits_op.py", line 178, in test_check_grad
self.check_grad(['X'], 'Out', check_pir=True)
File "/paddle/build/test/legacy_test/op_test.py", line 2986, in check_grad
self.check_grad_with_place(
File "/paddle/build/test/legacy_test/op_test.py", line 3298, in check_grad_with_place
numeric_grads = self.check_grad_with_place_for_static(
File "/paddle/build/test/legacy_test/op_test.py", line 3089, in check_grad_with_place_for_static
self._assert_is_close(
File "/paddle/build/test/legacy_test/op_test.py", line 2942, in _assert_is_close
self.assertLessEqual(max_diff, max_relative_error, err_msg())
AssertionError: 0.0007811970012982192 not less than or equal to 0.0005 : Operator sigmoid_cross_entropy_with_logits error, Gradient Check On Place(cpu) variable X (shape: (64, 20), dtype: float64) max gradient diff 7.811970e-04 over limit 5.000000e-04, the first error element is 3, expected 5.481218e-04, but got 2.099690e-05.
----------------------------------------------------------------------
Ran 1 test in 0.521s
FAILED (failures=1)
因此可以推断,是由于容忍阈值比较大,所以使得反向计算错误的问题没有暴露出来。
在修复pr将max_relative_error=0.0005
,仍然可以得到相对正确的计算结果,如下图:
W0515 06:13:53.214535 18318 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.4, Runtime API Version: 11.2
W0515 06:13:53.220155 18318 gpu_resources.cc:164] device: 0, cuDNN Version: 8.1.
numeric :
[array([[ 3.77183495e-04, -2.20797240e-04, -3.14791652e-04, ...,
-2.02644761e-04, 2.45672779e-04, -6.17090210e-06],
[ 4.84435251e-04, 2.84202716e-04, 1.83931716e-05, ...,
6.29521346e-04, 5.20318281e-04, 2.33842612e-05],
[ 3.12574747e-04, -4.71084098e-04, 1.10182442e-04, ...,
6.98864401e-04, 2.33956572e-04, -7.56920161e-05],
...,
[ 7.15934865e-04, -3.74937504e-04, 3.26225586e-04, ...,
3.84216391e-05, -5.20641936e-04, -4.17575856e-04],
[ 1.96946960e-05, 3.88698082e-04, -2.81023718e-04, ...,
-5.38852117e-05, 3.67850861e-04, -1.84393860e-04],
[-9.46350590e-05, 1.44749951e-05, -2.59066396e-04, ...,
5.43415898e-04, 5.17161748e-05, 5.20940836e-04]])]
analytic_grads :
[array([[ 3.77183699e-04, -2.20797405e-04, -3.14791844e-04, ...,
-2.02644764e-04, 2.45672821e-04, -6.17087178e-06],
[ 4.84435362e-04, 2.84202718e-04, 1.83934196e-05, ...,
6.29521507e-04, 5.20318417e-04, 2.33842613e-05],
[ 3.12574821e-04, -4.71084140e-04, 1.10182313e-04, ...,
6.98864469e-04, 2.33956823e-04, -7.56920088e-05],
...,
[ 7.15934877e-04, -3.74937645e-04, 3.26225574e-04, ...,
3.84217363e-05, -5.20642019e-04, -4.17576055e-04],
[ 1.96948667e-05, 3.88698345e-04, -2.81023766e-04, ...,
-5.38854093e-05, 3.67850951e-04, -1.84394142e-04],
[-9.46347782e-05, 1.44751433e-05, -2.59066405e-04, ...,
5.43416127e-04, 5.17164533e-05, 5.20940836e-04]])]
max_relative_error :
0.0005
.
----------------------------------------------------------------------
Ran 1 test in 2.753s
OK
from paddle.
BUG已修复,详细见PR
from paddle.
Related Issues (20)
- ParamAttr的参数learning_rate取值非1时,paddle是否会为learning_rate创建一个变量,如何给这个变量指定名字? HOT 1
- paddle/phi/kernels/impl/matmul_kernel_impl.h中无法对DenseTensor进行cout HOT 2
- 补全rms_norm算子CPU实现 HOT 2
- `masked_fill_`對int64處理異常,塞入paddle.iinfo(paddle.int64).max會被當作min HOT 11
- cmake编译 GPU 版本的 Paddle 出现错误 LINK : fatal error LNK1248: 映像大小(8046A000)超过允许的最大大小(80000000) HOT 2
- 【求助】关于paddledetection 中 mot_ppyoloe_l_36e_pipeline model的推理过程 HOT 3
- paddle.grad无法计算高阶微分 HOT 2
- paddle的nansum不支持empty的求和 HOT 2
- 调用paddle.grad函数计算指数函数和三角函数的高阶微分,报错如何解决 HOT 1
- 提供详细的模型结构打印信息 HOT 7
- paddle.nn.AdaptiveAvgPool2D没有对输入的维度进行检查 HOT 1
- paddle.nn.MaxPool2D缺乏对输入的形状检查
- 在华为Atlas加速卡服务器,银河麒麟V10系统的服务器上编译paddle,cmake执行完毕但是没有生成makefile HOT 1
- 【PIR】Sparse 算子 Python API 适配 HOT 1
- 2.6.0在windows上编译cpu版本失败 HOT 3
- help paddle-fluid-v1.7.1 textcnn running error HOT 1
- 同一份模型训练代码在CUDA11.8和CUDA11.6上的运行结果差异较大 HOT 4
- 运行预测 self.predictor.run() 后,引擎内存释放不掉 HOT 7
- 【bug】paddle多机多卡训练的时候master指定为ipv6地址时,代码报错 HOT 2
- paddle.histogram遇到特定整數pattern發生illegal memory access,導致進程必須重啟 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddle.