Could Optim.TwiceDifferentiable support CuArray?

I try to transform optimization problem from CPU to GPU computing in Optim.jl.

My demo examples are as follows.

#Version 1-1

using Optim, CUDA

f1(x) = x[1]^4 + x[2]^4 - 8 * x[1] * x[2] + 2.0
init = [1.0 1.0]

opt1 = optimize(f1, init)

#Version 1-2

CUDA.allowscalar(true)
opt1_cu = optimize(f1, init |> cu)
#Version 2-1
f2(x, y) = x^4 + y^4 - 8 * x * y + 2.0

func = TwiceDifferentiable(vars -> f2(vars[1], vars[2]),
    ones(2); autodiff = :forward)

opt2 = optimize(func, ones(2))

#Version 2-2

CUDA.allowscalar(true)

func_cu = TwiceDifferentiable(var -> f2(vars[1], vars[2]),
    CUDA.ones(2); autodiff = :forward)

opt2_cu = optimize(func_cu, CUDA.ones(2))

When I transfer the version 1-1 to 1-2, I can have the similar results successfully.

However, when I do the version 2-2, I have the following errors:

ERROR: LoadError: UndefVarError: vars not defined
Stacktrace:
  [1] (::var"#7#8")(var::CuArray{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float32}, Float32, 2}, 1, CUDA.Mem.DeviceBuffer})
    @ Main c:\Users\user\Desktop\CY\Demo_Unconstrained_Optim-2022-03-10.jl:41
  [2] vector_mode_dual_eval!
    @ C:\Users\user\.julia\packages\ForwardDiff\PBzup\src\apiutils.jl:37 [inlined]
  [3] vector_mode_gradient!(result::DiffResults.MutableDiffResult{1, Float32, Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, f::var"#7#8", x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#7#8", Float32}, Float32, 2, CuArray{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float32}, Float32, 2}, 1, CUDA.Mem.DeviceBuffer}})
    @ ForwardDiff C:\Users\user\.julia\packages\ForwardDiff\PBzup\src\gradient.jl:113
  [4] gradient!
    @ C:\Users\user\.julia\packages\ForwardDiff\PBzup\src\gradient.jl:37 [inlined]
  [5] gradient!
    @ C:\Users\user\.julia\packages\ForwardDiff\PBzup\src\gradient.jl:35 [inlined]
  [6] (::NLSolversBase.var"#42#48"{var"#7#8", ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#7#8", Float32}, Float32, 2, CuArray{ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8", Float32}, Float32, 2}, 1, CUDA.Mem.DeviceBuffer}}})(out::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, x::CuArray{Float32, 
1, CUDA.Mem.DeviceBuffer})
    @ NLSolversBase C:\Users\user\.julia\packages\NLSolversBase\cfJrN\src\objective_types\twicedifferentiable.jl:132  
  [7] value_gradient!!(obj::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 
2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})    @ NLSolversBase C:\Users\user\.julia\packages\NLSolversBase\cfJrN\src\interface.jl:82
  [8] initial_state(method::Newton{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}}, options::Optim.Options{Float64, Nothing}, d::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Optim C:\Users\user\.julia\packages\Optim\wFOeG\src\multivariate\solvers\second_order\newton.jl:45
  [9] optimize(d::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, method::Newton{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}}, options::Optim.Options{Float64, Nothing})
    @ Optim C:\Users\user\.julia\packages\Optim\wFOeG\src\multivariate\optimize\optimize.jl:36
 [10] optimize(f::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}; inplace::Bool, autodiff::Symbol, kwargs::Base.Pairs{Symbol, 
Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Optim C:\Users\user\.julia\packages\Optim\wFOeG\src\multivariate\optimize\interface.jl:90
 [11] optimize(f::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})  
    @ Optim C:\Users\user\.julia\packages\Optim\wFOeG\src\multivariate\optimize\interface.jl:84
 [12] top-level scope
    @ c:\Users\user\Desktop\CY\Demo_Unconstrained_Optim-2022-03-10.jl:44

How can I solve to this situation?

Or may I have an alternative way to do optimization problem by GPU computing?

Thanks.

That just looks like a typo in v2-2 — does it work if you use vars -> f2(vars[1], vars[2])

Thanks for your finding.

After correcting the code, I have the following errors:

ERROR: LoadError: ArgumentError: cannot take the CPU address of a CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
Stacktrace:
  [1] unsafe_convert(#unused#::Type{Ptr{Float32}}, x::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ CUDA C:\Users\user\.julia\packages\CUDA\Axzxe\src\array.jl:315
  [2] unsafe_convert(#unused#::Type{Ptr{Float32}}, V::SubArray{Float32, 1, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, Int64}, true})
    @ Base .\subarray.jl:427
  [3] syr!(uplo::Char, α::Float32, x::SubArray{Float32, 1, 
CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, Int64}, true}, A::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false})
    @ LinearAlgebra.BLAS C:\Users\user\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\blas.jl:1338
  [4] update_columns!
    @ C:\Users\user\.julia\packages\PositiveFactorizations\BN7VB\src\cholesky.jl:176 [inlined]
  [5] solve_diagonal!(B::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, d::SubArray{Int8, 1, Vector{Int8}, Tuple{UnitRange{Int64}}, true}, tol::Float32)
    @ PositiveFactorizations C:\Users\user\.julia\packages\PositiveFactorizations\BN7VB\src\cholesky.jl:91
  [6] ldlt!(::Type{PositiveFactorizations.Positive{Float32}}, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, pivot::Type{Val{false}}; tol::Float32, blocksize::Int64)
    @ PositiveFactorizations C:\Users\user\.julia\packages\PositiveFactorizations\BN7VB\src\cholesky.jl:38
  [7] #cholesky!#5
    @ C:\Users\user\.julia\packages\PositiveFactorizations\BN7VB\src\cholesky.jl:17 [inlined]
  [8] #cholesky!#6
    @ C:\Users\user\.julia\packages\PositiveFactorizations\BN7VB\src\cholesky.jl:18 [inlined]
  [9] cholesky!(::Type{PositiveFactorizations.Positive}, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, pivot::Type) (repeats 2 times)
    @ PositiveFactorizations C:\Users\user\.julia\packages\PositiveFactorizations\BN7VB\src\cholesky.jl:18
 [10] update_state!(d::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, state::Optim.NewtonState{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, LinearAlgebra.Cholesky{Float32, 
CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, method::Newton{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}})
    @ Optim C:\Users\user\.julia\packages\Optim\wFOeG\src\multivariate\solvers\second_order\newton.jl:67
 [11] optimize(d::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, method::Newton{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}}, options::Optim.Options{Float64, Nothing}, state::Optim.NewtonState{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, LinearAlgebra.Cholesky{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}})
    @ Optim C:\Users\user\.julia\packages\Optim\wFOeG\src\multivariate\optimize\optimize.jl:54
 [12] optimize(d::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, method::Newton{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}}, options::Optim.Options{Float64, Nothing})
    @ Optim C:\Users\user\.julia\packages\Optim\wFOeG\src\multivariate\optimize\optimize.jl:36
 [13] optimize(f::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}; inplace::Bool, autodiff::Symbol, kwargs::Base.Pairs{Symbol, 
Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Optim C:\Users\user\.julia\packages\Optim\wFOeG\src\multivariate\optimize\interface.jl:90
 [14] optimize(f::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})  
    @ Optim C:\Users\user\.julia\packages\Optim\wFOeG\src\multivariate\optimize\interface.jl:84
 [15] top-level scope
    @ c:\Users\user\Desktop\CY\Demo_Unconstrained_Optim-2022-03-10.jl:44
in expression 

It seems CuArray cannot run in TwiceDifferentiable.

For me this line in the trace back

indicates that the BLAS used (typically OpenBLAS or MKL) is a CPU oriented BLAS. But I know of cuBLAS: can we exchange BLAS from CPU to GPU, or is this a research question?

I’m not sure whether we can exchange BLAS from CPU to GPU, or is it a developing issue?

But it seems like an object problem in running GPU type for Optim.TwiceDifferentiable.

The problem is that it used PositiveFactorizations as the default linear solve and that is not GPU compatible. It’s also not compatible with SparseArrays and there we fall back to . What factorization would you use on the GPU? I suppose there’s a cholesky defined for CuArrays?

2 Likes