Problem of binarycrossentropy with GPU?

Hi,

I am trying to create a model in Flux, with binarycrossentropy in loss function, but encountered a problem invoking binarycrossentropy on GPU. I reproduced the error with this simple case:

# just copy from document in Flux/src/layers/stateless.jl
binarycrossentropy.(σ.([-1.1491, 0.8619, 0.3127]), [0.9, 0.9, 0.1])

which works find and output:

3-element Array{Float64,1}:
 1.309487097347566  
 0.43850664672364076
 0.8304003662235442 

But when I moved the inputs to GPU, and run binarycrossentropy:

a1 = gpu([-1.1491, 0.8619, 0.3127])
a2 = gpu([1, 1, 0.])
binarycrossentropy.(σ.(a1), a2)

then I got this error:

┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception = (CUDAnative.MethodSubstitutionWarning(log(x::Float32) in Base.Math at special/log.jl:290, log(x::Float32) in CUDAnative at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/device/cuda/math.jl:71), Base.StackTraces.StackFrame[log at log.jl:290, binarycrossentropy at stateless.jl:26, #25 at broadcast.jl:49])
└ @ CUDAnative /home/taot/.julia/packages/CUDAnative/Lr0yj/src/compiler/irgen.jl:116
┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception = (CUDAnative.MethodSubstitutionWarning(log(x::Float32) in Base.Math at special/log.jl:290, log(x::Float32) in CUDAnative at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/device/cuda/math.jl:71), Base.StackTraces.StackFrame[log at log.jl:290, binarycrossentropy at stateless.jl:26, #25 at broadcast.jl:49])
└ @ CUDAnative /home/taot/.julia/packages/CUDAnative/Lr0yj/src/compiler/irgen.jl:116

InvalidIRError: compiling #25(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(binarycrossentropy),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArrays.CuArray},Nothing,typeof(CuArrays.cuσ),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to jl_alloc_string)
Stacktrace:
 [1] _string_n at strings/string.jl:60
 [2] StringVector at iobuffer.jl:31
 [3] #IOBuffer#318 at iobuffer.jl:114
 [4] multiple call sites at unknown:0
Reason: unsupported call through a literal pointer (call to jl_string_to_array)
Stacktrace:
 [1] unsafe_wrap at strings/string.jl:71
 [2] StringVector at iobuffer.jl:31
 [3] #IOBuffer#318 at iobuffer.jl:114
 [4] multiple call sites at unknown:0
Reason: unsupported call through a literal pointer (call to __memset_avx2_unaligned_erms)
Stacktrace:
 [1] fill! at array.jl:366
 [2] #IOBuffer#318 at iobuffer.jl:121
 [3] multiple call sites at unknown:0
Reason: unsupported dynamic function invocation (call to print)
Stacktrace:
 [1] print_to_string at strings/io.jl:124
 [2] string at strings/io.jl:168
 [3] throw_complex_domainerror at math.jl:31
 [4] log at special/log.jl:321
 [5] #binarycrossentropy#49 at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [6] binarycrossentropy at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [7] _broadcast_getindex_evalf at broadcast.jl:625
 [8] _broadcast_getindex at broadcast.jl:598
 [9] getindex at broadcast.jl:558
 [10] #25 at /home/taot/.julia/packages/GPUArrays/tIMl5/src/broadcast.jl:50
Reason: unsupported dynamic function invocation (call to print)
Stacktrace:
 [1] print_to_string at strings/io.jl:129
 [2] string at strings/io.jl:168
 [3] throw_complex_domainerror at math.jl:31
 [4] log at special/log.jl:321
 [5] #binarycrossentropy#49 at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [6] binarycrossentropy at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [7] _broadcast_getindex_evalf at broadcast.jl:625
 [8] _broadcast_getindex at broadcast.jl:598
 [9] getindex at broadcast.jl:558
 [10] #25 at /home/taot/.julia/packages/GPUArrays/tIMl5/src/broadcast.jl:50
Reason: unsupported call through a literal pointer (call to jl_array_grow_end)
Stacktrace:
 [1] _growend! at array.jl:811
 [2] resize! at array.jl:1003
 [3] print_to_string at strings/io.jl:131
 [4] string at strings/io.jl:168
 [5] throw_complex_domainerror at math.jl:31
 [6] log at special/log.jl:321
 [7] #binarycrossentropy#49 at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [8] binarycrossentropy at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [9] _broadcast_getindex_evalf at broadcast.jl:625
 [10] _broadcast_getindex at broadcast.jl:598
 [11] getindex at broadcast.jl:558
 [12] #25 at /home/taot/.julia/packages/GPUArrays/tIMl5/src/broadcast.jl:50
Reason: unsupported call through a literal pointer (call to jl_array_del_end)
Stacktrace:
 [1] _deleteend! at array.jl:820
 [2] resize! at array.jl:1008
 [3] print_to_string at strings/io.jl:131
 [4] string at strings/io.jl:168
 [5] throw_complex_domainerror at math.jl:31
 [6] log at special/log.jl:321
 [7] #binarycrossentropy#49 at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [8] binarycrossentropy at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [9] _broadcast_getindex_evalf at broadcast.jl:625
 [10] _broadcast_getindex at broadcast.jl:598
 [11] getindex at broadcast.jl:558
 [12] #25 at /home/taot/.julia/packages/GPUArrays/tIMl5/src/broadcast.jl:50
Reason: unsupported call through a literal pointer (call to jl_array_to_string)
Stacktrace:
 [1] Type at strings/string.jl:39
 [2] print_to_string at strings/io.jl:131
 [3] string at strings/io.jl:168
 [4] throw_complex_domainerror at math.jl:31
 [5] log at special/log.jl:321
 [6] #binarycrossentropy#49 at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [7] binarycrossentropy at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [8] _broadcast_getindex_evalf at broadcast.jl:625
 [9] _broadcast_getindex at broadcast.jl:598
 [10] getindex at broadcast.jl:558
 [11] #25 at /home/taot/.julia/packages/GPUArrays/tIMl5/src/broadcast.jl:50
Reason: unsupported call to the Julia runtime (call to jl_type_error)
Stacktrace:
 [1] print_to_string at strings/io.jl:124
 [2] string at strings/io.jl:168
 [3] throw_complex_domainerror at math.jl:31
 [4] log at special/log.jl:321
 [5] #binarycrossentropy#49 at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [6] binarycrossentropy at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [7] _broadcast_getindex_evalf at broadcast.jl:625
 [8] _broadcast_getindex at broadcast.jl:598
 [9] getindex at broadcast.jl:558
 [10] #25 at /home/taot/.julia/packages/GPUArrays/tIMl5/src/broadcast.jl:50
Reason: unsupported call through a literal pointer (call to jl_alloc_string)
Stacktrace:
 [1] _string_n at strings/string.jl:60
 [2] string at strings/substring.jl:186
 [3] throw_complex_domainerror at math.jl:31
 [4] log at special/log.jl:321
 [5] #binarycrossentropy#49 at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [6] binarycrossentropy at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [7] _broadcast_getindex_evalf at broadcast.jl:625
 [8] _broadcast_getindex at broadcast.jl:598
 [9] getindex at broadcast.jl:558
 [10] #25 at /home/taot/.julia/packages/GPUArrays/tIMl5/src/broadcast.jl:50
Reason: unsupported call through a literal pointer (call to __memcpy_avx_unaligned_erms)
Stacktrace:
 [1] unsafe_copyto! at array.jl:226
 [2] __unsafe_string! at strings/substring.jl:173
 [3] string at strings/substring.jl:189
 [4] throw_complex_domainerror at math.jl:31
 [5] log at special/log.jl:321
 [6] #binarycrossentropy#49 at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [7] binarycrossentropy at /home/taot/.julia/packages/Flux/dkJUV/src/layers/stateless.jl:26
 [8] _broadcast_getindex_evalf at broadcast.jl:625
 [9] _broadcast_getindex at broadcast.jl:598
 [10] getindex at broadcast.jl:558
 [11] #25 at /home/taot/.julia/packages/GPUArrays/tIMl5/src/broadcast.jl:50

Stacktrace:
 [1] check_ir(::CUDAnative.CompilerJob, ::LLVM.Module) at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/compiler/validation.jl:114
 [2] macro expansion at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/compiler/driver.jl:188 [inlined]
 [3] macro expansion at /home/taot/.julia/packages/TimerOutputs/ohPOH/src/TimerOutput.jl:197 [inlined]
 [4] #codegen#136(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/compiler/driver.jl:186
 [5] #codegen at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/compiler/driver.jl:0 [inlined]
 [6] #compile#135(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/compiler/driver.jl:47
 [7] #compile#134 at ./none:0 [inlined]
 [8] #compile at ./none:0 [inlined] (repeats 2 times)
 [9] macro expansion at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/execution.jl:389 [inlined]
 [10] #cufunction#176(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::getfield(GPUArrays, Symbol("##25#26")), ::Type{Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(binarycrossentropy),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArrays.CuArray},Nothing,typeof(CuArrays.cuσ),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}}) at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/execution.jl:357
 [11] cufunction(::Function, ::Type) at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/execution.jl:357
 [12] macro expansion at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/execution.jl:174 [inlined]
 [13] macro expansion at ./gcutils.jl:87 [inlined]
 [14] macro expansion at /home/taot/.julia/packages/CUDAnative/Lr0yj/src/execution.jl:171 [inlined]
 [15] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArrays.CuArray{Float32,1}, ::Tuple{CuArrays.CuArray{Float32,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(binarycrossentropy),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArrays.CuArray},Nothing,typeof(CuArrays.cuσ),Tuple{Base.Broadcast.Extruded{CuArrays.CuArray{Float32,1},Tuple{Bool},Tuple{Int64}}}},Base.Broadcast.Extruded{CuArrays.CuArray{Float32,1},Tuple{Bool},Tuple{Int64}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/taot/.julia/packages/CuArrays/kOUu1/src/gpuarray_interface.jl:60
 [16] gpu_call at /home/taot/.julia/packages/GPUArrays/tIMl5/src/abstract_gpu_interface.jl:151 [inlined]
 [17] gpu_call(::Function, ::CuArrays.CuArray{Float32,1}, ::Tuple{CuArrays.CuArray{Float32,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(binarycrossentropy),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArrays.CuArray},Nothing,typeof(CuArrays.cuσ),Tuple{Base.Broadcast.Extruded{CuArrays.CuArray{Float32,1},Tuple{Bool},Tuple{Int64}}}},Base.Broadcast.Extruded{CuArrays.CuArray{Float32,1},Tuple{Bool},Tuple{Int64}}}}}) at /home/taot/.julia/packages/GPUArrays/tIMl5/src/abstract_gpu_interface.jl:128
 [18] copyto! at /home/taot/.julia/packages/GPUArrays/tIMl5/src/broadcast.jl:48 [inlined]
 [19] copyto! at ./broadcast.jl:842 [inlined]
 [20] copy(::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArrays.CuArray},Tuple{Base.OneTo{Int64}},typeof(binarycrossentropy),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArrays.CuArray},Nothing,typeof(CuArrays.cuσ),Tuple{CuArrays.CuArray{Float32,1}}},CuArrays.CuArray{Float32,1}}}) at ./broadcast.jl:818
 [21] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArrays.CuArray},Nothing,typeof(binarycrossentropy),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArrays.CuArray},Nothing,typeof(CuArrays.cuσ),Tuple{CuArrays.CuArray{Float32,1}}},CuArrays.CuArray{Float32,1}}}) at ./broadcast.jl:798
 [22] top-level scope at In[3]:3

I tried testing crossentropy function in the same way, and didn’t have the same problem.

Is it a bug in any related package? Or am I doing something wrong? I notice the warning in the message, but don’t understand what it means. Could the error relate to this warning?

Thanks!

I’m not sure where the root of this problem lies, but this is a quick fix:

a1 = gpu([-1.1491, 0.8619, 0.3127])
a2 = gpu([1, 1, 0.])

CuArrays.@cufunc Flux.binarycrossentropy(ŷ, y; ϵ=eps(ŷ)) = -y*log(ŷ + ϵ) - (1 - y)*log(1 - ŷ + ϵ)

Flux.binarycrossentropy.(σ.(a1), a2)

This cufunc makes sure all functions get dispatched to their relevant CUDA implementation, in this case I believe only log is changed.

2 Likes

Hi, merck xiaan

Thank your for the reply! Yes the trick did solve my problem!

Today I just found this issue on Github https://github.com/FluxML/Flux.jl/issues/889, and someone has submitted a fix the same as your trick. I really should have looked at Github first.