Error This intrinsic must be compiled to be called. Flux/Zygote

Hey all. I am trying to implement a neural network controller which emulates this
example https://github.com/FluxML/Trebuchet.jl. In place of the ODE solver I am using a previously trained Flux model which outputs a single scalar. For the nerual controller I am using a Flux (LSTM) model. This model takes in as its inputs the target value and outputs 3 control variables which are input to the Flux model for prediction.

A non working example which illustrates my approach is as follows.

cd(@__DIR__)
using Pkg; Pkg.activate(".")

using Zygote: forwarddiff
using Statistics: mean
using Random
using DifferentialEquations, DiffEqFlux
using Flux, BSON, Plots
using CUDA
using StatsBase

CUDA.allowscalar(false)

# Load previously trained model
using BSON: @load
@load "saved path" model 
# Model was trained on gpu, and transferred to cpu for saving.  
model= gpu(model) 

#input to model is a 4D array (features, pool length, 1, datalength)
input = get_data(dataset, poollength, datalength, horizon)[1] ;

#resize to shape required for LSTM neural controller.  
#inputsize is the number of features 
LSTMinput= reshape(input, (inputsize, poollength, datalength-31)) |>gpu;

const set_point=0.1  
LSTMinput[1,:,:].= set_point;
const k = 32

#Use Flux dataloader to mini batch k samples.  
train_loader = Flux.Data.DataLoader((LSTMinput), batchsize = k, shuffle=false, partial=false) ;

#Function predicts output using trained Flux model
function predict(d)
  d= d |> gpu  
  Flux.reset!(model)
  preds=model(d)
  return(preds)
end

#Define my neural controller

controller =  Chain(
    LSTM(inputsize, 65),
    LSTM(65, 33),
    Dense(33, 3)) #|>gpu

 #controller parameters for Optimisation   
ps = Flux.params(controller)

#Function, takes as inputs 3D array to LSTM and outputs vectors of length, timestep for 3 nominated control variables 

function control(x)
    Flux.reset!(controller)
    k = size(x,3)
    inputs = [x[:,:,t] for t in 1:k]
    output = [controller(x) for x in inputs]
    C1 = [x[1,:] for x in output]
    C2 = [x[2,:] for x in output]
    C3 = [x[3,:] for x in output]

  return C1, C2, C3 

end

#function calculates controller outputs and feeds to model to predict output

outcome = function(x)
    k = size(x,3)
    C1, C2, C3= control(x);
    a= reshape(reduce(hcat, C1), 1, size(x,2), size(C1)[1])
    b= reshape(reduce(hcat, C2), 1, size(x,2), size(C1)[1])
    c= reshape(reduce(hcat, C3), 1, size(x,2), size(C1)[1])

    #this is hacky, selects features not equal to C1, C2, C3)
    #If on CUDA, use this 
    sel= iszero.(Int.(vcat(CUDA.zeros(5),1,CUDA.zeros(5),1,CUDA.zeros(4),1,CUDA.zeros(26))))

    #If on cpu, use this.
    sel= iszero.(Int.(vcat(zeros(5),1,zeros(5),1,zeros(4),1,zeros(26))))

    Xv = x[sel,:,:]

    #This is not quite right as the index of the replaced features in the x matrix is lost.  
    #Used cat to avoid a mutating array error

    X1=cat(dims=1, Xv, a)
    X2=cat(dims=1, X1, b)
    X3=cat(dims=1, X2, c)

    #reshape array to 4D for expected input to the model predict 
    z= reshape(X3, (inputsize, poollength, 1, k)) ;
    result=predict(z)

return result
  end

#loss minimises error between model output and setpoint

function loss(x)
    # l=CUDA.sum(outcome(x)' .- gpu(set_point))^2/length(x)  |>gpu
    Flux.mse(outcome(x), set_point')
end

opt = ADAM(0.05)

The forward pass seems to work ok, feeding a single data point, I can return the model output and loss.

#A single batch of size k
single=[LSTMinput[:,:,1:k]]

loss(single[1])
0.003993923902422385

output(single[1])
1×32 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.246026  0.14971  0.156318  0.15168  0.154398  0.155286  0.153184  0.149052  0.144793  …  0.181637  0.192891  0.213579  0.151786  0.138461  0.146778  0.124833  0.122962  0.120319

I hit a snag however when attempting to back propagate the gradient wrt to the Neural controller model parameters, ps.

using Flux: @epochs
@epochs 30 Flux.train!(loss, ps, train_loader, opt)  #, cb=cb

Calculating the gradient results in the following Stacktrace:

gs = gradient(() -> loss(single[1]), ps)

ERROR: this intrinsic must be compiled to be called
Stacktrace:
 [1] macro expansion
   @ C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0 [inlined]
 [2] _pullback(::Zygote.Context, ::Core.IntrinsicFunction, ::String, ::Type{Int64}, ::Type{Tuple{Ptr{Int64}}}, ::Ptr{Int64})
   @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:9

I don’t know if this is relevant, but substituting a dummy model like:

function predict(d)
ones(1,32) |> gpu
end

yields the same Stacktrace, while omitting the gpu seems to work.

function predict(d)
ones(1,32) 
end

julia> gs = gradient(() -> loss(single[1]), ps)
Grads(...)

Two related posts seem to suggest the need to define custom adjoints

I am new to Julia and Flux/Zygote and am struggling a little to work out where or how I am going wrong. Would really appreciate any help or pointers. Thanks for your time.

Welcome! Taking on a deep RL problem is quite the ambitious first project for learning Julia/Flux, but with any luck this error should be easy to remedy.

Instead of calling gpu in your predict function, move the data to the gpu first and then pass it to the loss function. If you’re using a Flux DataLoader, you can wrap it in Memory management · CUDA.jl to do that automatically.

2 Likes

Thank you ToucheSir. Appreciate your help!! I tried your suggestion but unfortunately am still getting the error. Here is a mwe.

using Statistics: mean
using Random
using Flux
using CUDA

CUDA.allowscalar(false)

data=CUDA.rand(100,24,128);

# Function predict,  take output from control and do something... 
function predict(x)
    y=CUDA.rand(100,24,1,128) .+ x
    y=y[1,1,1,:]
    return(y)
end

# Recurrent net controller
m = Chain(
    LSTM(100, 65),
    LSTM(65, 33),
    Dense(33, 1)) |>gpu

 # parameters of controller
p = Flux.params(m);

#Function, takes a 3D input and outputs a single control variable 
function control(x)
    Flux.reset!(m)
    inputs = [x[:,:,t] for t in 1:128]
    output = [m(x) for x in inputs]
    C1 = [x[1,:] for x in output]

  return C1
end

#outcome function:  outputs controller, updates data and feeds to predict
outcome = function(x)
    C1= control(x);
    a= reshape(reduce(hcat, C1), 1, size(x,2), size(C1)[1])
    Xv = x[2:end,:,:]
    X1=cat(dims=1, Xv, a)

   # reshape here as my predict function expects a 4D array.
    x_new = reshape(X1, (100, 24,1, 128))
    result=predict(x_new)

return result
  end

#loss minimises error between model output and a scalar 
function loss(x)
    l=sum(outcome(x)' .- 0.1)^2/length(x) 
    return l
end

opt = ADAM(0.05)

#Try
outcome(data)
loss(data)

using Flux: @epochs
@epochs 30 Flux.train!(loss, p, [data], opt) 

#Finding gradient... 
gs = gradient(() -> loss(data), p)

If I modify the above to run on CPU I get this message instead when calculating gradient so evidently I am doing something wrong in my Outcome function.


ERROR: DimensionMismatch("cannot broadcast array to have fewer dimensions")
Stacktrace:
  [1] check_broadcast_shape(#unused#::Tuple{}, Ashp::Tuple{Base.OneTo{Int64}})
    @ Base.Broadcast .\broadcast.jl:518
  [2] check_broadcast_shape(shp::Tuple{Base.OneTo{Int64}}, Ashp::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}})
    @ Base.Broadcast .\broadcast.jl:521
  [3] check_broadcast_axes
    @ .\broadcast.jl:523 [inlined]
  [4] check_broadcast_axes
    @ .\broadcast.jl:527 [inlined]
  [5] instantiate
    @ .\broadcast.jl:269 [inlined]
  [6] materialize!
    @ .\broadcast.jl:894 [inlined]
  [7] materialize!
    @ .\broadcast.jl:891 [inlined]
  [8] (::Zygote.var"#424#426"{4, Float64, Array{Float64, 4}, Tuple{Int64, Int64, Int64, Colon}})(dy::FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\lib\array.jl:39
  [9] (::Zygote.var"#2303#back#420"{Zygote.var"#424#426"{4, Float64, Array{Float64, 4}, Tuple{Int64, Int64, Int64, Colon}}})(Δ::FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
    @ Zygote C:\Users\bgladman\.julia\packages\ZygoteRules\OjfTt\src\adjoint.jl:59
 [10] Pullback
    @ .\REPL[37]:3 [inlined]
 [11] (::typeof(∂(predict)))(Δ::FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [12] Pullback
    @ .\REPL[45]:7 [inlined]
 [13] (::typeof(∂(#23)))(Δ::FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [14] Pullback
    @ .\REPL[47]:2 [inlined]
 [15] (::typeof(∂(loss)))(Δ::Float64)
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [16] Pullback
    @ .\REPL[51]:1 [inlined]
 [17] (::typeof(∂(#25)))(Δ::Float64)
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [18] (::Zygote.var"#94#95"{Zygote.Params, typeof(∂(#25)), Zygote.Context})(Δ::Float64)
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface.jl:348
 [19] gradient(f::Function, args::Zygote.Params)
    @ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface.jl:76
 [20] top-level scope
    @ REPL[51]:1

Would be extremely grateful for any advice regarding the cause of the error and how to properly implement this type of problem in Julia.

I think outcome(x) is returning a vector, and [1,2,3]' .+ 4 gives a 1-row matrix, not an Adjoint vector, which Zygote doesn’t currently know to un-do. And then things like ones(3) .+= ones(3,1) give the error you see (which should be fixed on Julia 1.7 I think.)

But the adjoint there, ', doesn’t change the result anyway, so you can delete it.

1 Like

Thanks for your help. Removing the adjoint solved the broadcast issue. Guess I am out of luck with getting this to run on a gpu.

Instead of using methods from the CUDA namespace directly, using generic methods that can dispatch on CuArrays seems to work:

# old, errors
function predict(x)
    y = CUDA.rand(100,24,1,128) .+ x
    y = y[1,1,1,:]
    return y
end

# new, works
function predict(x)
    y = rand!(similar(x, 100,24,1,128)) + x
    y = y[1,1,1,:]
    return y
end

This is generally the recommended way to write device-agnostic code anyhow, so that’s two birds with one stone.