DiffEqFlux gradient error: breaks with increasing size of problem, size should not affect autodiff

evolbio · November 16, 2021, 8:03pm

I am optimizing a simple ODE problem with DiffEqFlux to get started with Julia. This code worked on TensorFlow.

In the following code, running main(S=5, B=256) works, but main(S=7, B=256) fails. I previously used EnsembleProblem with same code and got same results. The error begins:

ERROR: Need an adjoint for constructor StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}. Gradient is of type Vector{Nothing}

but, as noted, the code runs for S=5. Also (separately) it seems a bit slow, so any hints about how to code for better performance welcome.

Thanks.

using Random: seed!
using RandomNumbers.Xorshifts
using DifferentialEquations, DiffEqFlux
using DiffEqSensitivity, OrdinaryDiffEq, ForwardDiff
using Statistics

r = Xoroshiro128Plus(0x1234567890abcdef)

L1 					= 0.02 #/(S*S) 	# L1 reg, normalized by size of weight matrix, W
L2 					= 0.05 #/(S*S)	# L2 regularization parameter
useL1				= true			# True -> L1, False -> L2

const final_t		        = 3.0			# final time
const tspan			= (0.0,final_t)	# range of times
atol				        = 1e-6			# abs tol for diffeq
rtol				        = 1e-4			# rel tol for diffeq
####################################################################

phi(x) = 1.0 ./ (exp.(-x) .+ 1.0)

function ode!(ds, s, w, t, input)
	ds .= phi(w*s) .- s .+ input
end

function newbatch(B, S; stochastic=false)
	initial = 0.1 .* ones(B,S)
	if stochastic
		inputvec = 2.0 .* rand(r,B)
	else
		inputvec = [range(0,2,length=B);]	# need semicolon to expand range
	end
	# make inputs [B,S] with first column as input to y0 at t=0, other cols as zeros
	input = hcat(inputvec, zeros(B,S-1))
	output = zeros(B,S)
	output = map(x -> convert(Float64,x >= 0.5 && x <= 1.5), inputvec)
	return input, initial, output
end

## cost function calculations
function reg(w, regularize)
	if !regularize
		return 0.0
	elseif useL1
		return L1 * norm(w,1)
	else
		return L2 * norm(w)
	end
end

function loss(p, B, S, regularize, stoch_batch)
	input, initial, output = newbatch(B, S, stochastic=stoch_batch)
	result = []
	for i in 1:B
		prob = ODEProblem((ds, s, p, t) -> ode!(ds, s, w(p,S), t, input[i,:]),
						initial[i,:], tspan, p)
		soln = solve(prob, Tsit5(), saveon = false, reltol=rtol,abstol=atol)
		(i == 1) ? result = soln[end]' : result = vcat(result, soln[end]')
	end
	loss_val = mean(abs2.(a(p) .* result[:,end] .- output)) .+ reg(w(p,S), regularize)
	return loss_val
end

function callback(p, l)
	#display(l)
	return false
end

w(p, S) = reshape(p[1:end-1],S,S)
a(p) = p[end]
	
# parameters p:
# a = p[end] = 1.0
# w = reshape(p[1:end-1],S,S)
function main(;B::Int=4, S::Int=3, regularize::Bool=false, stoch_batch::Bool=false,
			rnd_entropy::Bool=false)
	rnd_entropy ? seed!(r) : seed!(r, 0x1234567890abcdef)
	p = ones(S*S+1)
	p[1:end-1] = 0.1 .* randn(r, S*S)
	opt_result = DiffEqFlux.sciml_train(p -> loss(p, B, S, regularize, stoch_batch), 
			p, ADAM(0.1); cb = callback, maxiters = 10)
	finalp = opt_result.u
	
	println()
	print(opt_result)
	println()
	print("Final param = ", typeof(finalp))
	println()
	println("Loss calculated from final param = ", 
				loss(finalp, B, S, regularize, stoch_batch)[1])
	return w(opt_result.u,S)
end

main(S=5, B=256)
main(S=7, B=256)

goerch · November 16, 2021, 8:25pm

First of all: excellent MWE!

How long do I have to wait? OK, just finished. One more test…

Confirmed. Fails for me, too, on Julia 1.6.3 with

ERROR: LoadError: Need an adjoint for constructor StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}. Gradient is of type Vector{Nothing}
Stacktrace:
  [1] error(s::String)  
    @ Base .\error.jl:33
  [2] (::Zygote.Jnew{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}, Nothing, false})(Δ::Vector{Nothing})

Time to file an issue against Zygote?

P.S.: only random observation: Tsit5()
P.P.S.: the only thing I’d check before is if the gradient is valid.

evolbio · November 16, 2021, 10:03pm

Thanks very much for the quick reply. It fails similarly for a different solver and a different optimizer.

ChrisRackauckas · November 16, 2021, 10:37pm

Random idea, try Array(soln)[:,end]' and see if the error goes away. Seems like an odd omission of an adjoint. @dhairyagandhi96

evolbio · November 16, 2021, 10:46pm

Thanks for considering this. Made this change:

(i == 1) ? result = Array(soln)[:,end]' : result = vcat(result, Array(soln)[:,end]')

and got the same error result.

ChrisRackauckas · November 16, 2021, 10:49pm

Simplify and delete some things until it goes away. That’ll isolate what it is.

evolbio · November 16, 2021, 11:07pm

Commenting this line:

#output = map(x -> convert(Float64,x >= 0.5 && x <= 1.5), inputvec)

allowed code to run for larger S values. I can experiment with alternative ways to achieve the same output matrix. Is using map in that way expected to cause failure of AD?

Note: The line above should be output = ones(B) instead of output = zeros(B,S) if commenting as here.

ChrisRackauckas · November 17, 2021, 12:32am

Looks like a bug in the map adjoint.