Bizarre Possible "Piracy Bug" Stalls Julia

I have been tracking down this bizarre bug all night. I think it could be a Julia bug, but I am not certain. The funny thing is I know the solution: use the latest master for StaticArrays (which will be tagged soon). However, that doesn’t seem to be a good answer because now it seems that “the old” StaticArrays.jl release is causing Julia to stall in a part of the code which does not deal with StaticArrays at all. This “action at a distance” is reminiscent of the Optim/Roots causing type-instabilities bug from awhile ago, so I thought it would be good to try and document it as clearly as possible. However… I haven’t been able to get an MWE. Maybe this isn’t enough information to really get anywhere since I wasn’t able to find an MWE, but maybe this is actually finding something so I wanted to make sure it got documented.

Essentially, I found it because I have a test in OrdinaryDiffEq.jl that stalls when StaticArrays.jl is not checked out. By directly calling internal functions of the package to simulate every step of the test, I can “walk” directly to the bug:

using OrdinaryDiffEq, RecursiveArrayTools, DiffEqBase
f = function (t,u,du)
  du[1] = u[2]
  du[2] = -9.81
end
tspan2 = (0.0,Inf)
u0 = [50.0,0.0]
prob2 = ODEProblem(f,u0,tspan2)

condtion= function (t,u,integrator) # Event when event_f(t,u,k) == 0
  @show t
  t-4
end

affect! = function (integrator)
  terminate!(integrator)
end

terminate_callback3 = ContinuousCallback(condtion,affect!,interp_points=100)

condtion= function (t,u,integrator) # Event when event_f(t,u,k) == 0
  u[1]
end

affect! = nothing
affect_neg! = function (integrator)
  integrator.u[2] = -integrator.u[2]
end

callback = ContinuousCallback(condtion,affect!,affect_neg!,interp_points=100)


bounce_then_exit = CallbackSet(callback,terminate_callback3)

integrator = init(prob2,Vern7(),callback=bounce_then_exit)

OrdinaryDiffEq.loopheader!(integrator)
OrdinaryDiffEq.perform_step!(integrator,integrator.cache)


integrator.q11 = integrator.EEst^integrator.opts.beta1
q = integrator.q11/(integrator.qold^integrator.opts.beta2)
q = max(inv(integrator.opts.qmax),min(inv(integrator.opts.qmin),q/integrator.opts.gamma))
dtnew = integrator.dt/q
ttmp = integrator.t + integrator.dt
integrator.isout = integrator.opts.isoutofdomain(ttmp,integrator.u)
integrator.accept_step = (!integrator.isout && integrator.EEst <= 1.0) || (integrator.opts.force_dtmin && abs(integrator.dt) <= abs(integrator.opts.dtmin))

integrator.tprev = integrator.t
integrator.t = ttmp
integrator.qold = max(integrator.EEst,integrator.opts.qoldinit)
OrdinaryDiffEq.calc_dt_propose!(integrator,dtnew)
discrete_callbacks = integrator.opts.callback.discrete_callbacks
continuous_callbacks = integrator.opts.callback.continuous_callbacks
atleast_one_callback = false

continuous_modified = false
discrete_modified = false

At this point, there are two things variables that matter: integrator, and integrator.opts.callback.continuous_callbacks. The next call that the test would be doing is equivalent to:

OrdinaryDiffEq.find_first_continuous_callback(integrator,continuous_callbacks[1],(continuous_callbacks[2],)...)

However, this command stalls. What it calls is:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/callbacks.jl#L9

Which is the recursive calls:

# Base Case: Only one callback
function find_first_continuous_callback(integrator,callback::ContinuousCallback)
  (find_callback_time(integrator,callback)...,1,1)
end

# Starting Case: Compute on the first callback
function find_first_continuous_callback(integrator,callback::ContinuousCallback,args...)
  find_first_continuous_callback(integrator,find_callback_time(integrator,callback)...,1,1,args...)
end

function find_first_continuous_callback(integrator,tmin::Number,upcrossing::Float64,idx::Int,counter::Int,callback2)
  counter += 1 # counter is idx for callback2.
  tmin2,upcrossing2 = find_callback_time(integrator,callback2)

  if (tmin2 < tmin && tmin2 != zero(typeof(tmin))) || tmin == zero(typeof(tmin))
    return tmin2,upcrossing,counter,counter
  else
    return tmin,upcrossing,idx,counter
  end
end

function find_first_continuous_callback(integrator,tmin::Number,upcrossing::Float64,idx::Int,counter::Int,callback2,args...)
  find_first_continuous_callback(integrator,find_first_continuous_callback(integrator,tmin,upcrossing,idx,counter,callback2)...,args...)
end

The only DiffEq code this hits inside of the recursion is OrdinaryDiffEq.find_callback_time, and I can check in the REPL that

OrdinaryDiffEq.find_callback_time(integrator,continuous_callbacks[1])
OrdinaryDiffEq.find_callback_time(integrator,continuous_callbacks[2])

both of those are fine. So, even though all of the function calls inside of the recursion are fine, and none of this code uses StaticArrays, this recursion will only stall on StaticArrays v0.5.0 but not on v0.5.1 / master. Peculiar, right?

Just to be clear, here (OrdinaryDiffEq.jl master) we only touch StaticArrays very indirectly. OrdinaryDiffEq.jl depends on StaticArrays.jl through RecursiveArrayTools.jl which only defines a single dispatch that uses it:

https://github.com/JuliaDiffEq/RecursiveArrayTools.jl/blob/master/src/utils.jl#L9

function recursivecopy!{T<:SArray,N}(b::AbstractArray{T,N},a::AbstractArray{T,N})
  @inbounds for i in eachindex(a)
    b[i] = a[i]
  end
end

But this dispatch only is involved if StaticArrays are involved, which is not the case in this example.

Now what’s even more bizarre to me is that I then locally removed StaticArrays and this dispatch from RecursiveArrayTools.jl, meaning no functions I am calling have any relation to StaticArrays, and this behavior was still seen. Trying to find out what’s going on, I check:

println(Pkg.dependents("StaticArrays"))


println(Pkg.dependents("StaticArrays"))
AbstractString["EnhancedGJK", "Celeste", "AdaptiveDistanceFields", "RecursiveArrayTools", "ImageFiltering", "IntervalArithmetic", "Contour", "NRRD", "StochasticDiffEq", "ImageTransformations", "Rotations", "Poltergeist", "ApproxFun", "Sugar", "ContMechTensors", "DrakeVisualizer", "ForwardDiff", "Cliffords", "RigidBodyTreeInspector", "DynamicalBilliards", "GPUArrays", "OrdinaryDiffEq", "CoordinateTransformations", "CompScienceMeshes", "Transpiler", "Geodesy", "JuLIP", "RegionTrees", "NearestNeighbors", "RigidBodyDynamics", "Augmentor"]

The since OrdinaryDiffEq and now RecurisveArrayTools are now not dependent on StaticArrays in this test, that means that the only package which is relating the this to StaticArrays is ForwardDiff (though none of the ForwardDiff related code is used in this example!). Fully removing ForwardDiff or StaticArrays from ForwardDiff would be a very difficult task.

Guesses?

My hopeful guess is that StaticArrays is accidentally was performing some kind of piracy that changed how some very basic recursion/splatting was working. Then that would make sense why just having it in the scope causes this to fail (though other tests hit this same portion of the code and do fine…). If anyone knows if this is the case, let me know so I can consider this solved.

If that’s no the cause, then like the Optim/Roots rand type-instability bug it could be a deeper problem. I started this investigation thinking that this “action at a distance” was some kind of dispatch bug (since StaticArrays weren’t involved here), but alas I came up with nothing.

Conclusion

StaticArrays.jl is somehow making that recursion fail, but not on master and it will no longer be the case when the newest tag goes through. So yay that it’s fixed, though that kind of solution is not satisfying. But hopefully this attempt at finding out what the bug was helps somehow, but I have to give up since an MWE has been completely elusive.

Version Details

For this comparison, DiffEqBase v1.10.0, RecursiveArrayTools v0.9.0, and OrdinaryDiffEq master (which will be tagged as v2.6.0) were used. The stalling happens with StaticArrays v0.5, but not v0.5.1. This was tested on Julia v0.6-rc2.

1 Like

This seems very likely to be fixed by:
https://github.com/JuliaLang/julia/pull/21933

1 Like