Hi,
when I reimplement the model on my own dataset, I got an error.
with parameter --binary_mask 1 and --loss bce.
the error is
/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THCUNN/BCECriterion.cu:74: Acctype bce_functor_weights<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference, thrust::device_reference, thrust::device_reference, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [44,0,0], thread: [251,0,0] Assertion input >= 0. && input <= 1.
failed.
RuntimeError: reduce failed to synchronize: cudaErrorAssert: device-side assert triggered
I search the solution on the internet, some answers about the use of bce is useful for someone.
the answer is :
https://stackoverflow.com/questions/60022388/pytorch-runtimeerror-reduce-failed-to-synchronize-cudaerrorassert-device-sid
But I have confirm the range of values to calculate bce loss. there is no problem.
So now I am confused.
I don't know how to solve this problem.
when I change the parameters to --binary_mask 0 and --loss l1
the model can go training.
so where the error can be found in my implementation?
Thank U.