I’m trying to solve a food-web with many interacting diffeq. Because animals and plants can’t have negative population sizes it would be great to restrict the solution to only positive values. Additionally some of the equations depend on power functions (u^(rand(0:0.0000001:1))) and imaginary values aren’t an option.
My code is to large for a comment, therefore only the callbacks:

function condition(out,u, t, integrator)
out .= u .- 1e-6 # if population size is less than 1e-6 the population is extinct
end
function affect!(integrator,event_index)
u = integrator.u
u[event_index] = 0.0 # if population is extinct it's set to 0
end
cb = VectorContinuousCallback(condition,affect!, N*N_I) #first callback
cb2 = PositiveDomain()#second callback: restrict the population to positive and should set negative values to 0
cbs = CallbackSet(cb,cb2) #tuple of callbacks
prob = ODEProblem(func!,Pop_0,tspan,p, callback=cbs, reltol=1e-12) #make tolerance small to avoid big jumps.
solution = solve(prob) #solve problem

When running this code I sometimes get a Warning:

DomainError with -1.871257021144186e-15
Exponentiation yielding a complex result requires a complex argument

So I get nonetheless very small negative values which destroys my whole calculation .
I already tried to write functions like du[i] = max(0,u[i])^(rand(0:0.0000001:1)) but it couldn’t solve the Problem.

Thanks for the fast respond, the u^0.5 was just an example, the values are randomly drawn between 0 and 1.
I noticed when I just using the callback cb2 = PositiveDomain() I get a solution. But I need anyway the 1e-6 part to work.

If your ODE is simple enough, you could use a substitution u = exp(z) and solve the ODE for z instead.
(I was working on a PR for ModellingToolkit to do that symbolically but didn’t find the time to finish it. But my current state already works for ODEs, see)

Are there any optimizations to be had for sqrt (or other functions) if you make a function, just for the range [0, 1]? I Googled a bit to see, in general and for Julia.

Our experimental results show that the proposed algorithms provide a fairly good trade-off between accuracy and latency after two iterations for numbers of type float, and after three iterations for numbers of type double when using fused multiply–add instructions—giving almost complete accuracy. […]
Polynomial methods of high order rely heavily on multiplications and need to store
polynomial coefficients in memory; they also require a range reduction step

range reduction step could be skipped? And polynomial simplified?

As shown in Table 1, the proposed algorithms give significantly better performance
than the library functions on the Raspberry Pi, from 3.17 to 3.62 times faster, and for SP
numbers on ESP-32, 2.34 times faster for the reciprocal square root and approximately
1.78 times faster than the sqrtf (x) function.

# sqrt without the domain checks which we don't need since we handle the checks ourselves
@inline _sqrt(x::T) where {T<:Union{Float32,Float64}} = Base.sqrt_llvm(x)

It could be benchmarking isn’t realistic as always same value, and branching then not getting in the way, but would be otherwise for the range reduction, and this thus faster in realistic use. At least there must be a reason the do this for use by other exported functions. I guess they do not want to export _sqrt as it doesn’t covert the whole range, dangerous if you wouldn’t read the fine print.

EDIT: The speedup I benchmarked actually isn’t there (on x86), it’s 40% slower than SP sqrt (while a bit faster then DP sqrt), yes, sqrt instructions on x86 likely can’t be beaten. I suppose this code can help on ARM, as the the article explains.

Right way to see the speed:

julia> @btime Base.sqrt_llvm(r) setup=(r=rand(Float32));
1.882 ns (0 allocations: 0 bytes)
julia> function Sqrt31f(x::Float32)
i = reinterpret(Int32, x)
k = i & 0x00800000;
if(k != 0)
i = 0x5ed9e893 - (i >> 1);
y = reinterpret(Float32, i)
c = x*y;
y = 2.33130789f0*c*fma(y, -c, 1.07495356f0);
else
i = 0x5f19e8fd - (i >> 1);
y = reinterpret(Float32, i)
c = x*y;
y = 0.82421863f0*c*fma(y, -c, 2.1499474f0);
end
return y
end
I also started translating alg. 5 first but didn't figure out what the array-looking part was:
julia> function Sqrt32f(x::Float32)
y = [0x5ed9d098, 0x5f19d352,
2.33139729f0, 1.07492042f0,
0.82420468f0, 2.14996147f0] # (x);
c = x*y;
r = fma(y, -c, 1.0f0);
y = fma(0.5f0*c, r, c); # modified
return y;
end
Sqrt32f (generic function with 1 method)

That can’t be right - sub-nanosecond benchmarks would mean your computer did all those calculations in less time than it takes for a single addition. Doesn’t seem possible with a CPU in the Ghz range.

What about filling an array of N values once and checking the results against a slower version?

Have you checked whether the LLVM intrinsic already calls a specific x86_64 instruction for square roots? It may be that you can’t beat hardware on this.

Thanks for the interesting talk on sqrt().
I haven’t checked everything, because I changed multiple things in my code, but what finially worked was to hint that the problem is stiff. I totally forgot about that when writing this post.

function condition(out,u, t, integrator)
out .= u .- 1e-6 # if population size is less than 1e-6 the population is extinct
end
function affect!(integrator,event_index)
u = integrator.u
u[event_index] = 0.0 # if population is extinct it's set to 0
end
alg_hints=[:stiff]
cb = VectorContinuousCallback(condition,affect!, N*N_I)
cb2 = PositiveDomain()
cbs = CallbackSet(cb,cb2)
prob = ODEProblem(func!,Pop_0,tspan,p, callback=cbs, reltol=1e-12)
solution = solve(prob) #solve problem

I will do further checks and let you know if there are still problems.
Regards,
DerOzean