Error This intrinsic must be compiled to be called. Flux/Zygote

bgladman · August 25, 2021, 8:42am

Hey all. I am trying to implement a neural network controller which emulates this
example https://github.com/FluxML/Trebuchet.jl. In place of the ODE solver I am using a previously trained Flux model which outputs a single scalar. For the nerual controller I am using a Flux (LSTM) model. This model takes in as its inputs the target value and outputs 3 control variables which are input to the Flux model for prediction.

A non working example which illustrates my approach is as follows.

cd(@__DIR__)
using Pkg; Pkg.activate(".")

using Zygote: forwarddiff
using Statistics: mean
using Random
using DifferentialEquations, DiffEqFlux
using Flux, BSON, Plots
using CUDA
using StatsBase

CUDA.allowscalar(false)

# Load previously trained model
using BSON: @load
@load "saved path" model 
# Model was trained on gpu, and transferred to cpu for saving.  
model= gpu(model) 

#input to model is a 4D array (features, pool length, 1, datalength)
input = get_data(dataset, poollength, datalength, horizon)[1] ;

#resize to shape required for LSTM neural controller.  
#inputsize is the number of features 
LSTMinput= reshape(input, (inputsize, poollength, datalength-31)) |>gpu;

const set_point=0.1  
LSTMinput[1,:,:].= set_point;
const k = 32

#Use Flux dataloader to mini batch k samples.  
train_loader = Flux.Data.DataLoader((LSTMinput), batchsize = k, shuffle=false, partial=false) ;

#Function predicts output using trained Flux model
function predict(d)
  d= d |> gpu  
  Flux.reset!(model)
  preds=model(d)
  return(preds)
end

#Define my neural controller

controller =  Chain(
    LSTM(inputsize, 65),
    LSTM(65, 33),
    Dense(33, 3)) #|>gpu

 #controller parameters for Optimisation   
ps = Flux.params(controller)

#Function, takes as inputs 3D array to LSTM and outputs vectors of length, timestep for 3 nominated control variables 

function control(x)
    Flux.reset!(controller)
    k = size(x,3)
    inputs = [x[:,:,t] for t in 1:k]
    output = [controller(x) for x in inputs]
    C1 = [x[1,:] for x in output]
    C2 = [x[2,:] for x in output]
    C3 = [x[3,:] for x in output]

  return C1, C2, C3 

end

#function calculates controller outputs and feeds to model to predict output

outcome = function(x)
    k = size(x,3)
    C1, C2, C3= control(x);
    a= reshape(reduce(hcat, C1), 1, size(x,2), size(C1)[1])
    b= reshape(reduce(hcat, C2), 1, size(x,2), size(C1)[1])
    c= reshape(reduce(hcat, C3), 1, size(x,2), size(C1)[1])

    #this is hacky, selects features not equal to C1, C2, C3)
    #If on CUDA, use this 
    sel= iszero.(Int.(vcat(CUDA.zeros(5),1,CUDA.zeros(5),1,CUDA.zeros(4),1,CUDA.zeros(26))))

    #If on cpu, use this.
    sel= iszero.(Int.(vcat(zeros(5),1,zeros(5),1,zeros(4),1,zeros(26))))

    Xv = x[sel,:,:]

    #This is not quite right as the index of the replaced features in the x matrix is lost.  
    #Used cat to avoid a mutating array error

    X1=cat(dims=1, Xv, a)
    X2=cat(dims=1, X1, b)
    X3=cat(dims=1, X2, c)

    #reshape array to 4D for expected input to the model predict 
    z= reshape(X3, (inputsize, poollength, 1, k)) ;
    result=predict(z)

return result
  end

#loss minimises error between model output and setpoint

function loss(x)
    # l=CUDA.sum(outcome(x)' .- gpu(set_point))^2/length(x)  |>gpu
    Flux.mse(outcome(x), set_point')
end

opt = ADAM(0.05)

The forward pass seems to work ok, feeding a single data point, I can return the model output and loss.

#A single batch of size k
single=[LSTMinput[:,:,1:k]]

loss(single[1])
0.003993923902422385

output(single[1])
1×32 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.246026  0.14971  0.156318  0.15168  0.154398  0.155286  0.153184  0.149052  0.144793  …  0.181637  0.192891  0.213579  0.151786  0.138461  0.146778  0.124833  0.122962  0.120319

I hit a snag however when attempting to back propagate the gradient wrt to the Neural controller model parameters, ps.

using Flux: @epochs
@epochs 30 Flux.train!(loss, ps, train_loader, opt)  #, cb=cb

Calculating the gradient results in the following Stacktrace:

gs = gradient(() -> loss(single[1]), ps)

ERROR: this intrinsic must be compiled to be called
Stacktrace:
 [1] macro expansion
   @ C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0 [inlined]
 [2] _pullback(::Zygote.Context, ::Core.IntrinsicFunction, ::String, ::Type{Int64}, ::Type{Tuple{Ptr{Int64}}}, ::Ptr{Int64})
   @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:9

I don’t know if this is relevant, but substituting a dummy model like:

function predict(d)
ones(1,32) |> gpu
end

yields the same Stacktrace, while omitting the gpu seems to work.

function predict(d)
ones(1,32) 
end

julia> gs = gradient(() -> loss(single[1]), ps)
Grads(...)

Two related posts seem to suggest the need to define custom adjoints

github.com/FluxML/Zygote.jl

Many basic CUDA functions do not work with Zygote

opened 06:23PM - 09 Jul 20 UTC

clintonTE

MWE (Not specific to ones function, just using that as an example) ```Julia …using Zygote, CUDA function usecuda(x::CuVector) some1s = CUDA.ones(length(x)) return CUDA.sum(some1s .+ x) end function nocuda(x::Vector) some1s = ones(length(x)) return sum(some1s .+ x) end function mwe() x = rand(5) dx = x |> cu #verify functions work @assert nocuda(x) + usecuda(dx) > 0 ∇nocuda(x) = gradient((x)->nocuda(x), x) ∇usecuda(xcu) = gradient((xcu)->usecuda(xcu), xcu) @info "deriv cpu: $(∇nocuda(x))" @info "deriv gpu: $(∇usecuda(dx))" end mwe() ``` Output: ``` [ Info: deriv cpu: ([1.0, 1.0, 1.0, 1.0, 1.0],) ┌ Error: Exception while generating log record in module Main at C:\Users\Clinton\Dropbox\Projects\Capacity\src\Experimentation\mwe.jl:1322 │ exception = │ this intrinsic must be compiled to be called │ Stacktrace: │ [1] macro expansion at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:0 [inlined] │ [2] _pullback(::Zygote.Context, ::Core.IntrinsicFunction, ::String, ::Type{Int64}, ::Type{Tuple{Ptr{Int64}}}, ::Ptr{Int64}) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:13 │ [3] getindex at .\atomics.jl:347 [inlined] │ [4] _pullback(::Zygote.Context, ::typeof(getindex), ::Base.Threads.Atomic{Int64}) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:0 │ [5] macro expansion at C:\Users\Clinton\.julia\packages\CUDA\sjcZt\lib\utils\call.jl:37 [inlined] │ [6] macro expansion at C:\Users\Clinton\.julia\packages\CUDA\sjcZt\lib\cudadrv\libcuda.jl:669 [inlined] │ [7] macro expansion at C:\Users\Clinton\.julia\packages\CUDA\sjcZt\lib\cudadrv\error.jl:108 [inlined] │ [8] cuMemsetD32_v2 at C:\Users\Clinton\.julia\packages\CUDA\sjcZt\lib\utils\call.jl:93 [inlined] │ [9] _pullback(::Zygote.Context, ::typeof(CUDA.cuMemsetD32_v2), ::CuPtr{UInt32}, ::UInt32, ::Int64) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:0 │ [10] #set!#5 at C:\Users\Clinton\.julia\packages\CUDA\sjcZt\lib\cudadrv\memory.jl:372 [inlined] │ [11] set! at C:\Users\Clinton\.julia\packages\CUDA\sjcZt\lib\cudadrv\memory.jl:365 [inlined] │ [12] _pullback(::Zygote.Context, ::typeof(CUDA.Mem.set!), ::CuPtr{UInt32}, ::UInt32, ::Int64) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:0 │ [13] fill! at C:\Users\Clinton\.julia\packages\CUDA\sjcZt\src\array.jl:364 [inlined] │ [14] _pullback(::Zygote.Context, ::typeof(fill!), ::CuArray{Float32,1,Nothing}, ::Int64) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:0 │ [15] ones at C:\Users\Clinton\.julia\packages\CUDA\sjcZt\src\array.jl:350 [inlined] │ [16] _pullback(::Zygote.Context, ::typeof(CUDA.ones), ::Type{Float32}, ::Int64) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:0 │ [17] adjoint at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\lib\lib.jl:175 [inlined] │ [18] _pullback at C:\Users\Clinton\.julia\packages\ZygoteRules\6nssF\src\adjoint.jl:47 [inlined] │ [19] ones at C:\Users\Clinton\.julia\packages\CUDA\sjcZt\src\array.jl:352 [inlined] │ [20] _pullback(::Zygote.Context, ::typeof(CUDA.ones), ::Int64) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:0 │ [21] usecuda at C:\Users\Clinton\Dropbox\Projects\Capacity\src\Experimentation\mwe.jl:1302 [inlined] │ [22] _pullback(::Zygote.Context, ::typeof(usecuda), ::CuArray{Float32,1,Nothing}) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:0 │ [23] #16 at C:\Users\Clinton\Dropbox\Projects\Capacity\src\Experimentation\mwe.jl:1319 [inlined] │ [24] _pullback(::Zygote.Context, ::var"#16#20", ::CuArray{Float32,1,Nothing}) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface2.jl:0 │ [25] _pullback(::Function, ::CuArray{Float32,1,Nothing}) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface.jl:38 │ [26] pullback(::Function, ::CuArray{Float32,1,Nothing}) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface.jl:44 │ [27] gradient(::Function, ::CuArray{Float32,1,Nothing}) at C:\Users\Clinton\.julia\packages\Zygote\EjVY4\src\compiler\interface.jl:53 │ [28] ∇usecuda at C:\Users\Clinton\Dropbox\Projects\Capacity\src\Experimentation\mwe.jl:1319 [inlined] │ [29] macro expansion at .\logging.jl:322 [inlined] │ [30] mwe() at C:\Users\Clinton\Dropbox\Projects\Capacity\src\Experimentation\mwe.jl:1322 │ [31] top-level scope at C:\Users\Clinton\Dropbox\Projects\Capacity\src\Experimentation\mwe.jl:1325 │ [32] include_string(::Module, ::String, ::String) at .\loading.jl:1080 │ [33] (::Atom.var"#220#225"{String,String})() at C:\Users\Clinton\.julia\packages\Atom\isnka\src\eval.jl:174 │ [34] withpath(::Atom.var"#220#225"{String,String}, ::String) at C:\Users\Clinton\.julia\packages\CodeTools\VsjEq\src\utils.jl:30 │ [35] withpath(::Function, ::String) at C:\Users\Clinton\.julia\packages\Atom\isnka\src\eval.jl:9 │ [36] #219 at C:\Users\Clinton\.julia\packages\Atom\isnka\src\eval.jl:171 [inlined] │ [37] with_logstate(::Atom.var"#219#224"{String,String}, ::Base.CoreLogging.LogState) at .\logging.jl:398 │ [38] with_logger at .\logging.jl:505 [inlined] │ [39] #218 at C:\Users\Clinton\.julia\packages\Atom\isnka\src\eval.jl:170 [inlined] │ [40] hideprompt(::Atom.var"#218#223"{String,String}) at C:\Users\Clinton\.julia\packages\Atom\isnka\src\repl.jl:127 │ [41] macro expansion at C:\Users\Clinton\.julia\packages\Media\ItEPc\src\dynamic.jl:24 [inlined] │ [42] evalall(::String, ::Nothing, ::String) at C:\Users\Clinton\.julia\packages\Atom\isnka\src\eval.jl:160 │ [43] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at .\essentials.jl:712 │ [44] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N) at .\essentials.jl:711 │ [45] macro expansion at C:\Users\Clinton\.julia\packages\Atom\isnka\src\eval.jl:41 [inlined] │ [46] (::Atom.var"#188#189")() at .\task.jl:358 └ @ Main C:\Users\Clinton\Dropbox\Projects\Capacity\src\Experimentation\mwe.jl:1322 ``` Edit: Just a note that the reduction operation (the sum) does seem to work, but other functions (I tried zeros, ones, and cumprod) all break Zygote.

I am new to Julia and Flux/Zygote and am struggling a little to work out where or how I am going wrong. Would really appreciate any help or pointers. Thanks for your time.

ToucheSir · August 26, 2021, 6:28am

Welcome! Taking on a deep RL problem is quite the ambitious first project for learning Julia/Flux, but with any luck this error should be easy to remedy.

Instead of calling gpu in your predict function, move the data to the gpu first and then pass it to the loss function. If you’re using a Flux DataLoader, you can wrap it in Memory management · CUDA.jl to do that automatically.

bgladman · August 26, 2021, 10:39am

Thank you ToucheSir. Appreciate your help!! I tried your suggestion but unfortunately am still getting the error. Here is a mwe.

using Statistics: mean
using Random
using Flux
using CUDA

CUDA.allowscalar(false)

data=CUDA.rand(100,24,128);

# Function predict,  take output from control and do something... 
function predict(x)
    y=CUDA.rand(100,24,1,128) .+ x
    y=y[1,1,1,:]
    return(y)
end

# Recurrent net controller
m = Chain(
    LSTM(100, 65),
    LSTM(65, 33),
    Dense(33, 1)) |>gpu

 # parameters of controller
p = Flux.params(m);

#Function, takes a 3D input and outputs a single control variable 
function control(x)
    Flux.reset!(m)
    inputs = [x[:,:,t] for t in 1:128]
    output = [m(x) for x in inputs]
    C1 = [x[1,:] for x in output]

  return C1
end

#outcome function:  outputs controller, updates data and feeds to predict
outcome = function(x)
    C1= control(x);
    a= reshape(reduce(hcat, C1), 1, size(x,2), size(C1)[1])
    Xv = x[2:end,:,:]
    X1=cat(dims=1, Xv, a)

   # reshape here as my predict function expects a 4D array.
    x_new = reshape(X1, (100, 24,1, 128))
    result=predict(x_new)

return result
  end

#loss minimises error between model output and a scalar 
function loss(x)
    l=sum(outcome(x)' .- 0.1)^2/length(x) 
    return l
end

opt = ADAM(0.05)

#Try
outcome(data)
loss(data)

using Flux: @epochs
@epochs 30 Flux.train!(loss, p, [data], opt) 

#Finding gradient... 
gs = gradient(() -> loss(data), p)

If I modify the above to run on CPU I get this message instead when calculating gradient so evidently I am doing something wrong in my Outcome function.


ERROR: DimensionMismatch("cannot broadcast array to have fewer dimensions")
Stacktrace:
  [1] check_broadcast_shape(#unused#::Tuple{}, Ashp::Tuple{Base.OneTo{Int64}})
    @ Base.Broadcast .\broadcast.jl:518
  [2] check_broadcast_shape(shp::Tuple{Base.OneTo{Int64}}, Ashp::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}})
    @ Base.Broadcast .\broadcast.jl:521
  [3] check_broadcast_axes
    @ .\broadcast.jl:523 [inlined]
  [4] check_broadcast_axes
    @ .\broadcast.jl:527 [inlined]
  [5] instantiate
    @ .\broadcast.jl:269 [inlined]
  [6] materialize!
    @ .\broadcast.jl:894 [inlined]
  [7] materialize!
    @ .\broadcast.jl:891 [inlined]
  [8] (::Zygote.var"#424#426"{4, Float64, Array{Float64, 4}, Tuple{Int64, Int64, Int64, Colon}})(dy::FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\lib\array.jl:39
  [9] (::Zygote.var"#2303#back#420"{Zygote.var"#424#426"{4, Float64, Array{Float64, 4}, Tuple{Int64, Int64, Int64, Colon}}})(Δ::FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
    @ Zygote C:\Users\bgladman\.julia\packages\ZygoteRules\OjfTt\src\adjoint.jl:59
 [10] Pullback
    @ .\REPL[37]:3 [inlined]
 [11] (::typeof(∂(predict)))(Δ::FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [12] Pullback
    @ .\REPL[45]:7 [inlined]
 [13] (::typeof(∂(#23)))(Δ::FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [14] Pullback
    @ .\REPL[47]:2 [inlined]
 [15] (::typeof(∂(loss)))(Δ::Float64)
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [16] Pullback
    @ .\REPL[51]:1 [inlined]
 [17] (::typeof(∂(#25)))(Δ::Float64)
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [18] (::Zygote.var"#94#95"{Zygote.Params, typeof(∂(#25)), Zygote.Context})(Δ::Float64)
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface.jl:348
 [19] gradient(f::Function, args::Zygote.Params)
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface.jl:76
 [20] top-level scope
    @ REPL[51]:1

Would be extremely grateful for any advice regarding the cause of the error and how to properly implement this type of problem in Julia.

mcabbott · August 26, 2021, 7:01pm

I think outcome(x) is returning a vector, and [1,2,3]' .+ 4 gives a 1-row matrix, not an Adjoint vector, which Zygote doesn’t currently know to un-do. And then things like ones(3) .+= ones(3,1) give the error you see (which should be fixed on Julia 1.7 I think.)

But the adjoint there, ', doesn’t change the result anyway, so you can delete it.

bgladman · August 27, 2021, 8:16am

Thanks for your help. Removing the adjoint solved the broadcast issue. Guess I am out of luck with getting this to run on a gpu.

ToucheSir · August 28, 2021, 1:10am

Instead of using methods from the CUDA namespace directly, using generic methods that can dispatch on CuArrays seems to work:

# old, errors
function predict(x)
    y = CUDA.rand(100,24,1,128) .+ x
    y = y[1,1,1,:]
    return y
end

# new, works
function predict(x)
    y = rand!(similar(x, 100,24,1,128)) + x
    y = y[1,1,1,:]
    return y
end

This is generally the recommended way to write device-agnostic code anyhow, so that’s two birds with one stone.

Topic		Replies	Views
ERROR: this intrinsic must be compiled to be called General Usage flux , zygote , cuarrays	9	2041	December 29, 2020
Flux, CUDA, Zygote : InvalidIRError: compiling kernel getindex_kernel(CUDA.CuKernelContext, CuDeviceArray New to Julia cuda , flux , zygote	4	1015	December 30, 2020
How to use gradient of neural network as the loss function? Machine Learning question	13	2765	March 23, 2021
Flux: Scalar getindex error Machine Learning	13	2097	May 15, 2020
Flux differentiation error Machine Learning zygote	19	1711	November 19, 2020

Error This intrinsic must be compiled to be called. Flux/Zygote

Related topics