Hey all. I am trying to implement a neural network controller which emulates this
example https://github.com/FluxML/Trebuchet.jl. In place of the ODE solver I am using a previously trained Flux model which outputs a single scalar. For the nerual controller I am using a Flux (LSTM) model. This model takes in as its inputs the target value and outputs 3 control variables which are input to the Flux model for prediction.
A non working example which illustrates my approach is as follows.
cd(@__DIR__)
using Pkg; Pkg.activate(".")
using Zygote: forwarddiff
using Statistics: mean
using Random
using DifferentialEquations, DiffEqFlux
using Flux, BSON, Plots
using CUDA
using StatsBase
CUDA.allowscalar(false)
# Load previously trained model
using BSON: @load
@load "saved path" model
# Model was trained on gpu, and transferred to cpu for saving.
model= gpu(model)
#input to model is a 4D array (features, pool length, 1, datalength)
input = get_data(dataset, poollength, datalength, horizon)[1] ;
#resize to shape required for LSTM neural controller.
#inputsize is the number of features
LSTMinput= reshape(input, (inputsize, poollength, datalength-31)) |>gpu;
const set_point=0.1
LSTMinput[1,:,:].= set_point;
const k = 32
#Use Flux dataloader to mini batch k samples.
train_loader = Flux.Data.DataLoader((LSTMinput), batchsize = k, shuffle=false, partial=false) ;
#Function predicts output using trained Flux model
function predict(d)
d= d |> gpu
Flux.reset!(model)
preds=model(d)
return(preds)
end
#Define my neural controller
controller = Chain(
LSTM(inputsize, 65),
LSTM(65, 33),
Dense(33, 3)) #|>gpu
#controller parameters for Optimisation
ps = Flux.params(controller)
#Function, takes as inputs 3D array to LSTM and outputs vectors of length, timestep for 3 nominated control variables
function control(x)
Flux.reset!(controller)
k = size(x,3)
inputs = [x[:,:,t] for t in 1:k]
output = [controller(x) for x in inputs]
C1 = [x[1,:] for x in output]
C2 = [x[2,:] for x in output]
C3 = [x[3,:] for x in output]
return C1, C2, C3
end
#function calculates controller outputs and feeds to model to predict output
outcome = function(x)
k = size(x,3)
C1, C2, C3= control(x);
a= reshape(reduce(hcat, C1), 1, size(x,2), size(C1)[1])
b= reshape(reduce(hcat, C2), 1, size(x,2), size(C1)[1])
c= reshape(reduce(hcat, C3), 1, size(x,2), size(C1)[1])
#this is hacky, selects features not equal to C1, C2, C3)
#If on CUDA, use this
sel= iszero.(Int.(vcat(CUDA.zeros(5),1,CUDA.zeros(5),1,CUDA.zeros(4),1,CUDA.zeros(26))))
#If on cpu, use this.
sel= iszero.(Int.(vcat(zeros(5),1,zeros(5),1,zeros(4),1,zeros(26))))
Xv = x[sel,:,:]
#This is not quite right as the index of the replaced features in the x matrix is lost.
#Used cat to avoid a mutating array error
X1=cat(dims=1, Xv, a)
X2=cat(dims=1, X1, b)
X3=cat(dims=1, X2, c)
#reshape array to 4D for expected input to the model predict
z= reshape(X3, (inputsize, poollength, 1, k)) ;
result=predict(z)
return result
end
#loss minimises error between model output and setpoint
function loss(x)
# l=CUDA.sum(outcome(x)' .- gpu(set_point))^2/length(x) |>gpu
Flux.mse(outcome(x), set_point')
end
opt = ADAM(0.05)
The forward pass seems to work ok, feeding a single data point, I can return the model output and loss.
Calculating the gradient results in the following Stacktrace:
gs = gradient(() -> loss(single[1]), ps)
ERROR: this intrinsic must be compiled to be called
Stacktrace:
[1] macro expansion
@ C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0 [inlined]
[2] _pullback(::Zygote.Context, ::Core.IntrinsicFunction, ::String, ::Type{Int64}, ::Type{Tuple{Ptr{Int64}}}, ::Ptr{Int64})
@ Zygote C:\Users\bgladman\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:9
I don’t know if this is relevant, but substituting a dummy model like:
function predict(d)
ones(1,32) |> gpu
end
yields the same Stacktrace, while omitting the gpu seems to work.
function predict(d)
ones(1,32)
end
julia> gs = gradient(() -> loss(single[1]), ps)
Grads(...)
Two related posts seem to suggest the need to define custom adjoints
I am new to Julia and Flux/Zygote and am struggling a little to work out where or how I am going wrong. Would really appreciate any help or pointers. Thanks for your time.
Welcome! Taking on a deep RL problem is quite the ambitious first project for learning Julia/Flux, but with any luck this error should be easy to remedy.
Instead of calling gpu in your predict function, move the data to the gpu first and then pass it to the loss function. If you’re using a Flux DataLoader, you can wrap it in Memory management · CUDA.jl to do that automatically.
Thank you ToucheSir. Appreciate your help!! I tried your suggestion but unfortunately am still getting the error. Here is a mwe.
using Statistics: mean
using Random
using Flux
using CUDA
CUDA.allowscalar(false)
data=CUDA.rand(100,24,128);
# Function predict, take output from control and do something...
function predict(x)
y=CUDA.rand(100,24,1,128) .+ x
y=y[1,1,1,:]
return(y)
end
# Recurrent net controller
m = Chain(
LSTM(100, 65),
LSTM(65, 33),
Dense(33, 1)) |>gpu
# parameters of controller
p = Flux.params(m);
#Function, takes a 3D input and outputs a single control variable
function control(x)
Flux.reset!(m)
inputs = [x[:,:,t] for t in 1:128]
output = [m(x) for x in inputs]
C1 = [x[1,:] for x in output]
return C1
end
#outcome function: outputs controller, updates data and feeds to predict
outcome = function(x)
C1= control(x);
a= reshape(reduce(hcat, C1), 1, size(x,2), size(C1)[1])
Xv = x[2:end,:,:]
X1=cat(dims=1, Xv, a)
# reshape here as my predict function expects a 4D array.
x_new = reshape(X1, (100, 24,1, 128))
result=predict(x_new)
return result
end
#loss minimises error between model output and a scalar
function loss(x)
l=sum(outcome(x)' .- 0.1)^2/length(x)
return l
end
opt = ADAM(0.05)
#Try
outcome(data)
loss(data)
using Flux: @epochs
@epochs 30 Flux.train!(loss, p, [data], opt)
#Finding gradient...
gs = gradient(() -> loss(data), p)
If I modify the above to run on CPU I get this message instead when calculating gradient so evidently I am doing something wrong in my Outcome function.
I think outcome(x) is returning a vector, and [1,2,3]' .+ 4 gives a 1-row matrix, not an Adjoint vector, which Zygote doesn’t currently know to un-do. And then things like ones(3) .+= ones(3,1) give the error you see (which should be fixed on Julia 1.7 I think.)
But the adjoint there, ', doesn’t change the result anyway, so you can delete it.
Instead of using methods from the CUDA namespace directly, using generic methods that can dispatch on CuArrays seems to work:
# old, errors
function predict(x)
y = CUDA.rand(100,24,1,128) .+ x
y = y[1,1,1,:]
return y
end
# new, works
function predict(x)
y = rand!(similar(x, 100,24,1,128)) + x
y = y[1,1,1,:]
return y
end
This is generally the recommended way to write device-agnostic code anyhow, so that’s two birds with one stone.