Cumbersome array reshaping for broadcasting, unlike numpy

I’m still new to julia but one of my favorite aspects is dotted broadcasting syntax with my own custom and fast functions. However, in many cases I still find it cumbersome to use this functionality compared to my usual numpy syntax, at least with my current knowledge of julia.

Often arrays don’t have matching shapes for broadcasting because their dimensions don’t correspond. To make them fit, you can use None or np.newaxis:

a = np.random.rand(3, 5)
b = np.random.rand(4, 6)
c = a[:, None, :, None] + b[None, :, None, :]
c.shape == (3, 4, 5, 6)

In my opinion this is easy to write and read. I think in julia I’d have to use reshape. But for this I need to know all dimension sizes and can’t just use colons if I mean the whole dimension:

sa = size(a)
sb = size(b)
c = reshape(a, (sa[1], 1, sa[2], 1)) .+ reshape(b, (1, sb[1], 1, sb[2]))

Is there a better way to do this that I’m not aware of?

Best,
Julius

2 Likes

I must confess that I’ve never come across this usecase. What sort of things do you use it for?

So you want to insert singleton dimensions? I thought maybe I could get permutedims to help me, but that didn’t work. The obvious answer is reshape, as you already know. One simplification would be this:

c = reshape(a, (size(a,1), 1, :, 1)) .+ reshape(b, (1, size(b,1), 1, :))

Not a huge improvement, though.

1 Like

I have to admit I’m struggling to understand what the numpy code actually means. You have a matrix of size 3x5 and want to add it to a matrix size 4x6, so this isn’t standard matrix addition. What is this operation trying to do intuitively?

If you want to sum each element in a to each element in b something like

reshape([i + j for i ∈ a for j ∈ b], 3,4,5,6)

or to keep the dimensions more general

reshape([i + j for i ∈ a for j ∈ b], size(a)..., size(b)...)

could work, but I’m not sure whether that’s what you’re trying to do?

That seems to be a nice shorthand notation in numpy.
But the (4,6) array could be reshaped in various ways (into 4 dimensions). The shorthand notation will only work for a few of these cases (6?), right?

Rather than seeing it as a type of reshaping, think of it as a way of inserting singleton dimensions.

1 Like

In Julia you can create such utility function yourself:

julia> replacecolons(dims::Tuple{}, s) = ()
replacecolons (generic function with 1 method)

julia> function replacecolons(dims::Tuple, s)
           if dims[1] isa Colon
               (s[1], replacecolons(Base.tail(dims), Base.tail(s))...)
           else
               (1, replacecolons(Base.tail(dims), s)...)
           end
       end
replacecolons (generic function with 2 methods)

julia> addsingleton(v, args...) = reshape(v, replacecolons(args, size(v)))
addsingleton (generic function with 1 method)

julia> addsingleton(rand(2,3), *, :, :, *)
1×2×3×1 Array{Float64,4}:
[:, :, 1, 1] =
 0.959039  0.999838

[:, :, 2, 1] =
 0.225681  0.647496

[:, :, 3, 1] =
 0.526588  0.65457

I wonder if there is a package that does this already (maybe something like TensorOperations.jl or a similar package?).

1 Like

Yes, the whole thing is meant to insert the necessary singleton dimensions so that broadcasting works. A use case could be the following:

# I have ten displacement vectors a 3 elements:
displacements = np.random.rand(10, 3)
# I have fifteen locations, also 3 element vectors
locations = np.random.rand(15, 3)
# now, the displacements are actually supposed to be applied with four different scaling factors
scalars = np.array([0.2, 0.5, 1, 1.5])
# so what is the array holding a vector for each location displaced by each scaled displacement?
# this doesn't work because the dimensions (10, 3), (15, 3) and (4,) don't match easily:
resulting_locations_incorrect = locations + scalars * displacements
# so instead I introduce singleton dimensions for each array
resulting_locations_correct = (
    locations[None, None, :, :] + scalars[:, None, None, None] * displacements[None, :, None, :])
# the resulting shape is (4, 10, 15, 3)
# 4 scalings, 10 displacements, 15 locations, 3 vector elements

I like the general idea, but addsingleton is pretty long and would obfuscate longer expressions, actually my preferred way would be to “hack” indexing itself, to allow
arr[:new, :, :, :new]
for two singleton dimensions. How would I go about that or would this interfere with normal array functioning?

1 Like

I use the following definition

na = [CartesianIndex()]  # for "newaxis"
arr = rand(10,10)
arr[:,na,:,na] # is 10×1×10×1

However, I find I need this in Julia much less than in numpy, thanks to array comprehension.
Still, I think this is worth having in Base, since using [CatesianIndex()] is not very intuitive.

16 Likes

That looks good! What do you mean by array comprehension though, maybe I can learn about a better way of doing things?

Array comprehension is the [... for ...] syntax.
For example, numpy’s x[:,None] + y[None,:] can be [x+y for x in x, y in y].
Python also has comprehensions, but they only produce python-native (1d) lists, not numpy arrays.
Come to think of it, for plotting I usually use transposition+broadcasting, e.g. heatmap(x,y,f.(x',y)), which is kind of a 2d-specific hack but short to type, and definitely in the spirit of np.newaxis rather than comprehensions. With the na definition it can be heatmap(x,y,f.(x[na,:],y)), which is a bit more explicit about the intent.

I would do something like this:

displacements = [rand(3) for i=1:10]
locations = [rand(3) for i=1:15]
scalars = [0.2, 0.5, 1, 1.5]

resulting_locations = [l + s*d for l in locations, d in displacements, s in scalars]

Since you are using 3D points you could use a struct or a static array for better performance. I find this conceptually more correct too, if you have a collection of points, use a collection (vector in this case) of points.

1 Like

That is great, I honestly didn’t get that this kind of syntax leaves the dimensions intact (and I read the documentation a lot). I actually tried implementing a staticarray version for vectors yesterday but couldn’t make that work with broadcasting at all, because a single staticvector wants to be iterated over its three elements again even if I think of it as one element. But with the array comprehension, it could work. I’ll try it out.

I may have missed something about the problem (BTW, it would be nice to post a Julia MWE if you are asking questions about Julia, not Pyton), but if the only difference is singleton dimensions between the two arrays, I would drop singleton dimensions instead. See dropdims.

The issue rarely comes up in idiomatic Julia since the number of dimensions is part of the type, and manipulating that in runtime leads to type instability (effectively, programming Julia like Python brings its performance closer to Python. Cf poetic justice).

If you post an MWE, perhaps an idiomatic Julia solution can be constructed.

It seems a bit semantically questionable to express adding singleton dimensions by indexing with nothing (the Julia equivalent of None) or any other special value. However, this seems to be an inverse operation to dropdims so I could imagine an adddims function that does this.

9 Likes

If you want to have a vector treated as a scalar within a broadcasted expression, you can wrap it with Ref(x). Then the broadcasting will treat it as a single object and not try to iterate through it.

1 Like

I think you would normally know the number of added dimensions in compile time, so there would be no instability. I have used None indexing a lot in numpy, but I don’t think I ever had the location of Nones computed rather than hard-coded.
In fact, AFAIU, dropdims is type-unstable, and adddims would be too, whereas indexing with [CartesianIndex()] is stable.
EDIT: dropdims is actually type-stable. I was thinking about using it with an array for the dims argument, but I see that it’s actually not supported. dims needs to be a tuple or scalar.

Why is dropdims type-unstable? If you have an Array{T,N}, and call dropdims with an n-tuple of dimensions, then the output is Array{T, N-n}.

1 Like

dropdims would only unstable if we supported calling it without a dim tuple — then it’d automatically find the singleton dimensions and remove them. That’s how other languages work, but we don’t support it by default because it’s a trap.


On the original topic, @yha wins the prize — the way to do this is with [CartesianIndex()]. There are really just two key rules of Julia’s indexing semantics that make this work:

  • The returned shape is simply the result of concatenating all the shapes of all the indices together. In this manner, you can add dimensions to the output.
  • CartesianIndex can be used to allow your indices to span 0 or multiple dimensions in the source array. In this manner, you can remove dimensions from the output.

This definition — const newaxis = [CartesianIndex()] — just “falls out” from the combination of these two rules. Note that you can even easily repeat this new dimension with multiple elements of CartesianIndex(). An array of CartesianIndex() is simply one that doesn’t “eat up” any indices in the source array but still adds a dimension of the appropriate size to the output.

julia> A = rand(3,2)
3×2 Array{Float64,2}:
 0.357162  0.0357645
 0.983917  0.567235
 0.977043  0.646179

julia> A[:, [CartesianIndex()], :]
3×1×2 Array{Float64,3}:
[:, :, 1] =
 0.3571615947640512
 0.9839174510416808
 0.9770434812986168

[:, :, 2] =
 0.03576453669078483
 0.5672345489133457
 0.646178699858571

julia> A[:, [CartesianIndex(), CartesianIndex()], :]
3×2×2 Array{Float64,3}:
[:, :, 1] =
 0.357162  0.357162
 0.983917  0.983917
 0.977043  0.977043

[:, :, 2] =
 0.0357645  0.0357645
 0.567235   0.567235
 0.646179   0.646179

You can also do the inverse operation — and do “pointwise” indexing between the two columns by simply broadcasting CartesianIndex. The index provided in this case is just a vector (so it only returns a vector), but it spans both dimensions in the source array (since it has CartesianIndex elements):

julia> A = rand(3,3)
3×3 Array{Float64,2}:
 0.811331  0.0344879  0.545456
 0.13033   0.505987   0.367931
 0.973931  0.634874   0.548208

julia> A[CartesianIndex.(1:3, 1:3)]
3-element Array{Float64,1}:
 0.8113309595290243
 0.5059869766015384
 0.5482079082972307
15 Likes

Funny you should ask! I made the notation _ for exactly this adddims purpose:

julia> using TensorCast

julia> a = rand(3,5); b=rand(4,6);

julia> @cast aa[i,_,j,_] := a[i,j]; size(aa)
(3, 1, 5, 1)

But you can also directly broadcast like this:

julia> @cast c[i,x,j,y] := a[i,j] + b[x,y]; size(c)
(3, 4, 5, 6)

Whenever I write something like [p+q for p in a, q in b] which makes a 4-index array, I find myself adding comments to remember which indices are which… writing index notation can make such comments into code instead. (This one you could also do with @einsum.) The example of this post might read

@cast resloc[α, di, loc, μ] := locations[loc,μ] + scalars[α] * displacements[di,μ]

Internally this is doing exactly the same reshaping as originally suggested:

julia> @pretty @cast aa[i,_,j,_] := a[i,j]
begin
    local (sz_i, sz_j) = (size(a, 1), size(a, 2))
    aa = reshape(a, (sz_i, 1, sz_j, 1))
end

julia> @pretty  @cast c[i,x,j,y] := a[i,j] + b[x,y] # broadcasting
begin
    local donkey = orient(a, (:, *, :, *))
    local locust = orient(b, (*, :, *, :))
    c = (+).(donkey, locust)
end

julia> @pretty @einsum c[i,x,j,y] := a[i,j] + b[x,y] # four loops

The function orient (defined here) also just does reshaping, but takes a code like that of JuliennedArrays.

The newaxis trick is very clever. Notice that it does copy the array, whereas reshape does not:

julia> const newaxis = [CartesianIndex()];

julia> arr = rand(1000,1000);

julia> @btime TensorCast.orient($arr, (:,*,:,*));
  35.621 ns (2 allocations: 128 bytes)

julia> @btime $arr[:, newaxis, :, newaxis];
  1.385 ms (3 allocations: 7.63 MiB)
9 Likes