CuArray and Optim

RainerHeintzmann · August 25, 2018, 7:14pm

I am trying to write some deconvolution code in Julia via combining LBFGS Optimization (Optim.jl), automatic differentiation (AutoGrad.jl) and Cuda support (CuArray). Julia 1.0 on Windows 10.
I finally have some runnable code, but there seem to be major problems with CuArray and Optim.jl:

If I accidentally provide a non-Cuda array as one of the arguments, I get an LLVM crash:
…
optimize! at C:\Users\pi96doc.julia\packages\CUDAnative\opsly\src\compiler.jl:606
…
If I provide only CuArray arrays, there is no crash, but via the LBFGS optimization routine everything runs extremely slowly (at least 10x slower than without Cuda). There are also some minor disagreements in the results. E.g. the number of iterations is different. However, what really puzzles me, is that from that moment on, Jullia as a whole becomes incredibly slow. This is only remedied if Julia is restarted using exit().
I was so far not able to provide AutoGrad with a correct differentiation for the fft or rft routine. I thought it would simply be
@primitive rfft(x1),dy,y (irfft(dy,size(x1)))
@primitive irfft(x1,sz),dy,y (rfft(dy))
but this did not seem to work. Anyway providing the for the whole real->real convolution operation was no problem. Are there similar ways to provide such operations for JuliaDiff? This would be nice as this seems to be more native to Optim.jl than AutoGrad.jl.

Minor things: CuArray only seems to support fft and rfft up to 3 dimensions and complains, even if the trailing dimensions are singleton.

Any ideas about how to find out where the problems wrt. CuArray in the LBFGS method in Optim could appear?

maleadt · August 27, 2018, 2:46pm

Fixed on CuArrays/CUDAnative master
Try running with allowscalar=false, copying from and to the GPU is very slow

RainerHeintzmann · August 30, 2018, 5:18pm

Hmm. This does not work. The following happens:

result2=optimize(fg,cimg, GradientDescent(),myOptions)

ERROR: scalar getindex is disabled

Stacktrace:

[1] assertscalar at C:\Users\pi96doc\.julia\packages\GPUArrays\3E1qk\src\indexing.jl:6 [inlined]

[2] getindex at C:\Users\pi96doc\.julia\packages\GPUArrays\3E1qk\src\indexing.jl:29 [inlined]

[3] iterate at .\abstractarray.jl:838 [inlined]

[4] iterate at .\abstractarray.jl:836 [inlined]

[5] generic_normInf(::CuArray{Float32,3}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\LinearAlgebra\src\generic.jl:267

[6] normInf at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\LinearAlgebra\src\generic.jl:358 [inlined]

[7] norm(::CuArray{Float32,3}, ::Float64) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\LinearAlgebra\src\generic.jl:433

[8] optimize(::OnceDifferentiable{Float32,CuArray{Float32,3},CuArray{Float32,3}}, ::CuArray{Float32,3}, ::GradientDescent{LineSearches.InitialPrevious{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,getfield(Optim, Symbol("##12#14"))}, ::Optim.Options{Float64,Nothing}, ::Optim.GradientDescentState{CuArray{Float32,3},Float32}) at C:\Users\pi96doc\.julia\packages\Optim\fabGe\src\multivariate\optimize\optimize.jl:26

[9] optimize at C:\Users\pi96doc\.julia\packages\Optim\fabGe\src\multivariate\optimize\optimize.jl:33 [inlined]

[10] #optimize#87(::Bool, ::Symbol, ::Function, ::NLSolversBase.InplaceObjective{Nothing,typeof(mylossgradient!),Nothing}, ::CuArray{Float32,3}, ::GradientDescent{LineSearches.InitialPrevious{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,getfield(Optim, Symbol("##12#14"))}, ::Optim.Options{Float64,Nothing}) at C:\Users\pi96doc\.julia\packages\Optim\fabGe\src\multivariate\optimize\interface.jl:114

[11] optimize(::NLSolversBase.InplaceObjective{Nothing,typeof(mylossgradient!),Nothing}, ::CuArray{Float32,3}, ::GradientDescent{LineSearches.InitialPrevious{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,getfield(Optim, Symbol("##12#14"))}, ::Optim.Options{Float64,Nothing}) at C:\Users\pi96doc\.julia\packages\Optim\fabGe\src\multivariate\optimize\interface.jl:113

[12] top-level scope at none:0

maleadt · August 30, 2018, 6:38pm

Looks like only norm(::CuArray, p=2) is implemented in CuArrays: CuArrays.jl/highlevel.jl at 7cb56765a634259fc1bcedf43aa0c603afd0aa58 · JuliaGPU/CuArrays.jl · GitHub
It doesn’t look like CUBLAS has other norms, so we probably need a GPU-compatible generic fallback. The docs state The second argument p is not necessarily a part of the interface for norm, i.e. a custom type may only implement norm(A) without second argument, so maybe that’s not necessary and there’s other ways to express that computation?

Also, please use triple quotes to delimit code and output in your post.

pkofod · August 30, 2018, 7:28pm

maximum(abs, x) would work on a CuArray, right?

pkofod · August 30, 2018, 7:29pm

Thanks for trying this out! I’ve been meaning to see if there were problems with Optim and GPU’s to be fixed, but I never got my CuArrays up and running (only tried for a day), so never found out if there were things we needed to change. Can you try changing the inf-norm to maximum(abs, x) instead? A PR is also welcome, but I can also implement the changes we find necessary in this thread myself.

RainerHeintzmann · September 1, 2018, 11:02am

was giving it another try preceeding the optimization with the following

    using LinearAlgebra,Optim,CuArrays
    CuArrays.allowscalar(false);
    LinearAlgebra.norm1(x::CuArray{T,N}) where {T,N} = sum(abs, x); # specializes the one-norm
    LinearAlgebra.normInf(x::CuArray{T,N}) where {T,N} = maximum(abs, x); # specializes the one-norm    
    Optim.maxdiff(x::CuArray{T,N},y::CuArray{T,N}) where {T,N} = maximum(abs.(x-y));

This allowed the line

result2=optimize(fg,cimg, GradientDescent(),myOptions);

to run fine (not superfast, but this is probably due to the relatively small arrays.
There is however an output problem:

julia> result2
Results of Optimization Algorithm
 * Algorithm: Gradient Descent
Error showing value of type Optim.MultivariateOptimizationResults{GradientDescent{LineSearches.InitialPrevious{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,getfield(Optim, Symbol("##12#14"))},Float64,CuArray{Float32,3},Float32,Float32,Array{OptimizationState{Float32,GradientDescent{LineSearches.InitialPrevious{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,getfield(Optim, Symbol("##12#14"))}},1}}:
ERROR: scalar getindex is disabled

There also were frequent out-of-memory errors, even though the arrays were quite small (512x512x1):

ERROR: CUFFTError(code 2, cuFFT failed to allocate GPU or CPU memory)
Stacktrace:
 [1] macro expansion at C:\Users\pi96doc\.julia\packages\CuArrays\F96Gk\src\fft\error.jl:56 [inlined]
 [2] macro expansion at C:\Users\pi96doc\.julia\packages\CuArrays\F96Gk\src\fft\error.jl:57 [inlined]
 [3] _mkplan(::UInt8, ::Tuple{Int64,Int64,Int64}, ::UnitRange{Int64}) at C:\Users\pi96doc\.julia\packages\CuArrays\F96Gk\src\fft\CUFFT.jl:109
 [4] plan_rfft(::CuArray{Float64,3}, ::UnitRange{Int64}) at C:\Users\pi96doc\.julia\packages\CuArrays\F96Gk\src\fft\CUFFT.jl:405

Finally I was (naively?) trying to convert them back to the main CPU:

julia> Float32.(result2.minimizer)
ERROR: GPU compilation failed, try inspecting generated code with any of the @device_code_... macros
CompilerError: could not compile #19(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,3,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64}},Type{Float32},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,3,CUDAnative.AS.Global},Tuple{Bool,Bool,Bool},Tuple{Int64,Int64,Int64}}}}); passing and using non-bitstype argument
- argument_type = Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64}},Type{Float32},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,3,CUDAnative.AS.Global},Tuple{Bool,Bool,Bool},Tuple{Int64,Int64,Int64}}}}
- argument = 4

Any ideas?

maleadt · September 3, 2018, 6:26am

ERROR: scalar getindex is disabled

That’s because you disabled scalar indexing (which is good, preventing slow code that processes GPU values on the CPU), but many default show methods rely on iterating over the container. We’ve implemented alternative methods for some default containers (GPUArrays.jl/abstractarray.jl at 7470e90152e51ad4cca7d80e02590eb95cf6fce5 · JuliaGPU/GPUArrays.jl · GitHub), but Optim seems to wrap arrays differently hence triggering this error. This probably needs a redesign in order to cope with wrapper types without hardcoding these methods in GPUArrays/CuArrays.

ERROR: CUFFTError(code 2, cuFFT failed to allocate GPU or CPU memory)

Interesting. We have a memory allocator sitting between Julia and CUDA to handle failing memory allocations (since GPU memory pressure isn’t known by the GC), but that doesn’t apply to libraries like cuFFT. I’ve created an issue: Memory management for libraries · Issue #130 · JuliaGPU/CuArrays.jl · GitHub.

passing and using non-bitstype argument

That Broadcast argument is not bits indeed, probably due to the Nothing in there. A result of the recent Broadcast changes in CuArrays, maybe? cc @vchuravy

pkofod · September 3, 2018, 8:29pm

could you be specific here? what are we packing differently (so I can maybe remove the problem in Optim-land)

maleadt · September 4, 2018, 6:38am

It’s just that we need to add support for each wrapper specifically, so we’ve only done that for common Base wrappers (Adjoint, Transpose, SubArray). That obviously doesn’t scale, and we need a better solution, but it doesn’t seem easy (see e.g. Broadcast does not work with wrappers (Adjoint, Transpose, SubArray) · Issue #147 · JuliaGPU/GPUArrays.jl · GitHub).

Topic		Replies	Views
Could Optim.TwiceDifferentiable support CuArray? Optimization (Mathematical) question , cuda , optim , optimization	5	864	March 12, 2022
Not sure where function is going wrong GPU	8	416	December 22, 2023
CuArray supports bits type error when performing automatic differentiation in sciml_train (possibly due to array reshaping) GPU first-steps	6	479	July 28, 2021
Error when setting up Flux and CuArrays Machine Learning gpu , flux	4	659	October 2, 2019
Lux, optimization on gpu Optimization (Mathematical) question , gpu , cuda , optimization , sciml	8	316	January 13, 2025

CuArray and Optim

Related topics