Crazy allocations using CartesianIndices

I have been switching over a 2D flux computation method to use CartesianIndices so it will work for arrays of any dimension, ala https://julialang.org/blog/2016/02/iteration/

Unfortunately, I’ve been running into a bunch of memory allocation “gotchas” and I don’t understand what I need to do to avoid them. Here’s an example

function test(a)
    σ = 0.
    R = CartesianIndex(size(a)[1:end-1]); o = oneunit(R)
    @time for I ∈ 2o:R
        σ += a[I,1]-a[I-o,2]^2 # actually, a more complex function...
    end
    return σ
end

But when I run this I get slow code with a giant memory issue

julia> test(rand(256,256,2));
  0.046946 seconds (585.23 k allocations: 13.891 MiB)

Please, could someone let me know:

  1. What is going on here?
  2. How am I supposed to do this elegantly?

If you run @code_warntype test(rand(256, 256, 2)) you’ll see a lot of type-instabilities:

julia> @code_warntype test(rand(256, 256, 2))
Variables
  #self#::Core.Compiler.Const(test, false)
  a::Array{Float64,3}
  stats::Base.GC_Num
  elapsedtime::UInt64
  val::Nothing
  diff::Base.GC_Diff
  σ::Any
  R::CartesianIndex{_A} where _A
  o::CartesianIndex{_A} where _A
  @_10::Any
  I::Any

Body::Any

Those ultimately stem from size(a)[1:end-1], which takes a tuple and then slices it into a vector. The problem here is that the type of CartesianIndex depends on the number of dimensions, but the length of that slice of size(a) is not known to the compiler (it depends on the run-time value of end-1).

Fixing that removes all the allocations:

julia> function test(a)
           σ = 0.
           R = CartesianIndex((size(a, 1), size(a, 2))); o = oneunit(R)
           @time for I ∈ 2o:R
               σ += a[I,1]-a[I-o,2]^2 # actually, a more complex function...
           end
           return σ
       end
julia> test(rand(256,256,2));
  0.000243 seconds

The only issue here is that I’ve implicitly assumed that a has 3 dimensions, which your original code did not. We can make a sufficiently general version with a helper function:

function all_but_last(t::NTuple{N, T}) where {N, T}
  ntuple(i -> t[i], N - 1)
end

and use it like so:

julia> function test(a)
           σ = 0.
           R = CartesianIndex(all_but_last(size(a))); o = oneunit(R)
           @time for I ∈ 2o:R
               σ += a[I,1]-a[I-o,2]^2 # actually, a more complex function...
           end
           return σ
       end
test (generic function with 1 method)

julia> test(rand(256,256,2));
  0.000218 seconds

The moral is that the length of a tuple is part of its type, so be wary of operations that might accidentally throw away that information by converting to a Vector.

15 Likes

That’s a really helpful answer!

As you say, I need the code to work regardless of dimension, but your helper functions works great. However, I’m a little confused since this slicing was used in the blog post

function expfiltdim(x, dim::Integer, α)
    s = similar(x)
    Rpre = CartesianIndices(size(x)[1:dim-1])
    Rpost = CartesianIndices(size(x)[dim+1:end])
    _expfilt!(s, x, α, Rpre, size(x, dim), Rpost)
end

Would helpers like this avoid the issues even though dim is a variable?

function before_dim(dim,t::NTuple{N, T}) where {N, T}
  ntuple(i -> t[i], dim - 1)
end
function after_dim(dim,t::NTuple{N, T}) where {N, T}
  ntuple(i -> t[dim+i], N - dim)
end

The example there has the same type-instability, but it doesn’t cause a problem because _expfilt! is creating a “function barrier”. Even though the type of Rpre in expfiltdim cannot be inferred, the _expfilt! function still receives a value with a concrete type, so within the body of _expfilt! there is no type-instability. Check out the section of the manual I linked for more details. You could get the same benefit by moving your loop into its own dedicated function, although I personally think fixing the instability altogether is good practice when possible.

It’s hard to say–it depends on how clever the compiler is with its constant propagation. If you’re dealing with cases like this, then the function barrier approach seems like a good idea.

6 Likes

By the way, I don’t see any instabilities with Julia 1.5.0-beta1.0, on what version did you test your code:

julia> @code_warntype test(rand(256, 256, 2))
Variables
  #self#::Core.Compiler.Const(test, false)
  a::Array{Float64,3}
  stats::Base.GC_Num
  elapsedtime::UInt64
  val::Nothing
  diff::Base.GC_Diff
  σ::Float64
  R::CartesianIndex{2}
  o::CartesianIndex{2}
  @_10::Union{Nothing, Tuple{CartesianIndex{2},CartesianIndex{2}}}
  I::CartesianIndex{2}

Body::Float64
. . .
1 Like

It’s been improved on 1.5, all prior versions fail to infer it. It’s worth noting that there are still some tuple-indexing behaviors that aren’t inferred correctly, but I think it’s fair to say that all the most common ones are.

5 Likes

That sounds like it could save me a bunch of memory headaches. Is that release stable?

Since I’m trying to make this a package other people can use with other versions, I am also a bit nervous solving the problem this way.

If you want to support multiple Julia versions, you’ll have to handle it the “hard way.” Aside from ntuple, there’s Base.tail, Base.front, and @inlined splatting. I can’t pull up memory of a blog post on “lispy tuple programming,” anyone have suggestions?

1 Like

IMHO Julia is improving fast and for a brand new package like yours, I do not see why you should pay the price with compatibility with old versions… Of course it is totally up to you.

I have spent some time reading your code which is extremely elegant and concise. Coming from C++, I am still amazed that such a dimension agnostic coding style can be obtained at zero cost. If it is the case with Julia 1.5, it should be kept like this.

I also wonder about, the future adaption to GPU. It could be interesting to see if construction like @vchuravy https://github.com/JuliaGPU/KernelAbstractions.jl could GPUify your code efficiently (although I think that this package is not yet compatible with Julia 1.5 nor 1.4).

3 Likes

The tricks at the end of

cover a lot of cases.

Doing the exercises is recommended, and automatically confers a green belt to the user.

2 Likes

:grinning: I used to lead the MIT TKD club, so this is full circle.

2 Likes

Ok, I’m going to update to 1.5 (downloaded already) since I don’t enjoy tracking down all these memory thing.

However, I have one more chestnut which I would like to offer up to the hive mind. I still can’t get the function barrier approach to work for applying boundary conditions

function _apply!(a,dim::Integer,left,right,A)
    for i ∈ left, j ∈ right
        a[i,size(a,dim),j, dim] = A
    end
end
function test(a::Array{T,N},A) where {T,N}
    for dim ∈ 1:N-1
        left = CartesianIndices(size(a)[1:dim-1])
        right = CartesianIndices(size(a)[dim+1:N-1])
        @time _apply!(a,dim,left,right,A)
    end
    return a
end
julia> test(ones(20,20,20,3),0.);
  0.000299 seconds (2.80 k allocations: 93.766 KiB)
  0.000003 seconds (1 allocation: 16 bytes)
  0.000002 seconds (1 allocation: 16 bytes)

If I try the Val{dim} trick from that ipynb tutorial, I get

function _apply!(a,::Val{dim},left,right,A) where dim
    for i ∈ left, j ∈ right
        a[i,size(a,dim),j, dim] = A
    end
end
_apply!(a,dim::Integer,left,right,A) = (a,Val{dim}(),left,right,A)

julia> test(ones(20,20,20,3),0.);
  0.000026 seconds (4 allocations: 112 bytes)
  0.000048 seconds (5 allocations: 112 bytes)
  0.000092 seconds (4 allocations: 112 bytes)

The @code_typed is also a mess for a fairly simple idea: a[:,size(a,2),:,2]=A etc. Maybe it would be better to do this with code generation?

Generic splits are not handled well yet. There is Base.IteratorsMD.split, in case it’s useful:

julia> I = CartesianIndices((5,10,15,20)).indices
(Base.OneTo(5), Base.OneTo(10), Base.OneTo(15), Base.OneTo(20))

julia> pre, post = Base.IteratorsMD.split(I, Val(2));

julia> pre
(Base.OneTo(5), Base.OneTo(10))

julia> post
(Base.OneTo(15), Base.OneTo(20))
1 Like

I couldn’t get this to work satisfactorily for the BCs, but the rest of the solver is nicely dimensionally independent. Here’s a slice through 3D Taylor Green Vortex.
TGVortex

6 Likes