Broadcasting in 0.7

ChrisRackauckas · July 13, 2018, 8:29pm

Interesting, since all of the native Julia codes in DifferentialEquations.jl are built around abstract typing. I would say Sundials is highly limited in comparison to what we’re doing. This is discussed in the blog post:

Like the Sundials NVector approach, the native Julia solvers in OrdinaryDiffEq.jl/StochasticDiffEq.jl/DelayDiffEq.jl allow for state types which are broadcasting (there was a caveat in v0.6 about the length of statements, but this is fixed in v0.7). As a result, these libraries are built around broadcastable-types which don’t necessarily have (good) indexing. GPUArrays are a type which broadcasts but doesn’t index (well, there’s a slow fallback but you can even turn that off).

Normally the approach is to make the states array-like as @mbauman suggests. This is so that way buffer arrays can be used. There are many practical examples of this. When using a solver for stiff equations, you will need to solve linear equations. If you can vec_tmp .= u the state u into a normal vector, then this can easily be used with BLAS via *. So it’s much easier to do these kind of “to array and back” workflows if the broadcast can work arrays in there. Indeed, there is a lot of work being done by @dlfivefifty and @sacha to make multi-strided Julia array types able to use BLAS effectively (and in the latter case, a native Julia BLAS). But I still find that for things like LAPACK routines (\ after a factorization) you still need an array so this finds many uses.

RecursiveArrayTools.jl has an example of array which has been updated. The ArrayPartition is a “vector” made via the concatenation of the linear indexing of arrays in a tuple. Being housed in a tuple, these arrays can be heterogeneous. But the broadcast is defined recursively

github.com

SciML/RecursiveArrayTools.jl/blob/master/src/array_partition.jl#L265-L280


      
          end
          
          ## recursive methods
          
          function recursivecopy!(A::ArrayPartition, B::ArrayPartition)
              for (a, b) in zip(A.x, B.x)
                  recursivecopy!(a, b)
              end
          end
          recursivecopy(A::ArrayPartition) = ArrayPartition(copy.(A.x))
          
          recursive_mean(A::ArrayPartition) = mean((recursive_mean(x) for x in A.x))
          
          # note: consider only first partition for recursive one and eltype
          recursive_one(A::ArrayPartition) = recursive_one(first(A.x))
          recursive_eltype(A::ArrayPartition) = recursive_eltype(first(A.x))

so in an update statement, like as seen in the integrators like our RK4

github.com

SciML/OrdinaryDiffEq.jl/blob/master/src/perform_step/fixed_timestep_perform_step.jl#L282-L291


      
          halfdt = dt / 2
          k₁ = integrator.fsalfirst
          ttmp = t + halfdt
          k₂ = f(uprev + halfdt * k₁, p, ttmp)
          k₃ = f(uprev + halfdt * k₂, p, ttmp)
          k₄ = f(uprev + dt * k₃, p, t + dt)
          u = uprev + (dt / 6) * (2 * (k₂ + k₃) + (k₁ + k₄))
          integrator.fsallast = f(u, p, t + dt)
          integrator.stats.nf += 4
          if integrator.opts.adaptive

will go through and broadcast only u.x[i] with other u.x[i], so even if the different arrays have different types in them this broadcast is still fast. One use case for this is units. The linear index of the ArrayPartition is slow if you have arrays with different units because indexing is not type stable, but this broadcasting operation is type stable. Interestingly, because of how the cache is used, we’ve noticed that even ArrayPartitions with only Float64s can be faster on broadcasted operations than standard Float64 arrays… so that’s a great showcase of broadcast used in this manner.

I’d like to see where your AugmentedState goes though. We have something similar in DiffEqJump.jl which is used to implement variable-rate Poisson jumps onto continuous problems. To do so, you solve a rootfinding problem where the jump occurs when a value hits zero, so we take the user’s ODE state and pack it into an augmented vector with pieces on the end for calculating the extra jump coordinates and then just throw the whole thing into the ODE solver. It’s defined by an integral so this is exactly like your AugmentedState case.

github.com

SciML/JumpProcesses.jl/blob/master/src/extended_jump_array.jl

"""
$(TYPEDEF)

Extended state definition used within integrators when there are
`VariableRateJump`s in a system. For detailed examples and usage information, see
the
- [Tutorial](https://docs.sciml.ai/JumpProcesses/stable/tutorials/discrete_stochastic_example/)

### Fields

$(FIELDS)

## Examples
```julia
using JumpProcesses, OrdinaryDiffEq
f(du,u,p,t) = du .= 0
rate(u,p,t) = (1+t)*u[1]*u[2]

# suppose we wish to decrease each of the two variables by one
# when a jump occurs

This file has been truncated. show original

This still needs to be updated for v0.7, but you can see how this just chains indices in order to broadcast it like a big concatenated vector.

github.com

SciML/JumpProcesses.jl/blob/master/src/extended_jump_array.jl#L42-L45


      
          ```julia
          # for 1 <= i <= N
          ueja[i] == ueja.u[i]

I always put a linear index on it too as a fallback, but just like the ArrayPartition it should be avoided. (Though one question we don’t have answered well with these is implementation hiding. Because we save the state during the solution, we end up saving the full jump variables as well and so the user doesn’t get back the same array type they put in. For your case that would be fine though since it sounds like g((x(t)) is something the user actually wants).

One extreme case of this is MultiScaleArrays.jl

These are augmented arrays to the extreme, where you put a tree structure on them. But via tree traversal you can still define a linear index. That linear index is implemented with recursive binary searches. The broadcast is a much more sensible recursive broadcast:

github.com

SciML/MultiScaleArrays.jl/blob/master/src/math.jl#L31-L42


      
              out
          end
          
          @inline function Base.copyto!(dest::AMSA, bc::Broadcast.Broadcasted{<:AMSAStyle})
              N = length(dest.nodes)
              for i in 1:N
                  copyto!(dest.nodes[i], unpack(bc, i))
              end
              copyto!(dest.values, unpack(bc, nothing))
              dest
          end

In fact, this structure builds the most complicated types we’ve successfully solved differential equations on, including an almost thousand line type definition (from this thread). Again, using tuples for units plus this broadcast overload allows it to perform just fine even though it’s probably “the most type-unstable linear index imaginable”. But, it the linear shape makes it still .= over to a vector for stiff solvers, makes it easily interact with tools for printing vectors, etc.

So I think the summary of what I am saying is, @mbauman is right in that if you can just throw a linear index through the whole thing then you should because it is a useful structure. Broadcast doesn’t need to make use of it, and many times it’s faster to avoid it and maybe the ordering can be almost nonsensical, but it’s an easy fix to a lot of things like Julia’s basic printing capabilities. In some sense, if you treat these things as all “vectors”, then broadcast between them all just means “element-wise” and using the broadcast mechanism you can define the element-wise implementation differently depending on what you need. I wouldn’t count this as abuse, I think this is a super helpful use of the broadcast system!

Topic		Replies	Views
Blog post: Loop fusion and vectorization in Julia 0.6 Internals & Design announcement , broadcast	28	8420	May 4, 2017
How to customize the new broadcasting infrastructure in v0.7 General Usage	12	2630	December 18, 2018
why doesn't `@.` always broadcast? New to Julia broadcast	12	324	July 21, 2025
Custom structs in JuliaDiffEq: In-place broadcasting results in allocation and slowdown General Usage performance	7	681	December 14, 2021
Migrating code from Matlab: Accessing arrays without allocating Performance	20	1947	November 19, 2018

Related topics