Strange behaviors in python and numpy

Numpy just assumes that you want to do broadcasting here, so it just starts adding stuff to your numpy array.

In [1]: import numpy as np

In [2]: a = np.array([1,2,3])

In [3]: sum(a,a)
Out[3]: array([7, 8, 9])

In [4]: ?sum
Signature: sum(iterable, /, start=0)
Docstring:
Return the sum of a 'start' value (default: 0) plus an iterable of numbers

When the iterable is empty, return the start value.
This function is intended specifically for use with numeric values and may
reject non-numeric types.
Type:      builtin_function_or_method

In [5]: a + 1 + 2 + 3
Out[5]: array([7, 8, 9])

In [6]: a + sum(a)
Out[6]: array([7, 8, 9])

can we start a sub-thread about Numpy?

Nobody ever said Numpy is too buggy

Are you saying numpy might have correctness issues?

That’s not correctness issues. In a in b in Numpy means what any(a .== b) means in Julia. It’s a bad definition to be sure, but not really a correctness problem.

except it also has this behavior

In [2]: np.array([1,2,3]) == np.array([1,2])
ValueError: operands could not be broadcast together with shapes (3,) (2,)

In [3]: np.array([1,2,3]) == np.array([2])
Out[3]: array([False,  True, False])

In [12]: np.array([1,2,3]) in np.array([2])
Out[12]: True

No idea whether this is correct or incorrect but it did catch me by surprise when I was porting some Python code to Julia:

>>> import numpy as np
>>> x = np.float32(0)
>>> (x ** 2).dtype
dtype('float64')

@jling That behaviour is completely consistent. Remember that in NumPy, a == b is the same as Julia’s a .== b, and a in b is the same as Julia’s any(a .== b).

So, we get the exact same behaviour in Julia:

julia> [1,2,3] .== [1, 2]
ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 3 and 2
[ ... ]

julia> [1,2,3] .== [2]
3-element BitVector:
 0
 1
 0

julia> any([1,2,3] .== [2])
true

oh wow I actually hate this.

Like NumPy, we broadcast arrays with length 1 across a given dimension. This is useful, because we can do stuff like

julia> x = rand(4, 3);

julia> x_normed = x ./ sum(x; dims=2)
4×3 Matrix{Float64}:
 0.0099922  0.907046   0.0829616
 0.628005   0.0108374  0.361158
 0.568802   0.249942   0.181256
 0.361719   0.220452   0.417829

Which works, even though sum returns a 4 x 1 matrix.

Isn’t this basically the whole point of broadcasting?

I would have thought in general

x != [x] != Ref(x)

for broadcasting, but idk, maybe I just don’t have a basic understanding of the point of broadcasting

Yeah, this is what I call “implicit” broadcasting — dimensions that happen to have length one also “extrude out” (or repeat) to match other arguments exactly like missing dimensions do. I like this visualization.

It would be a pity if we couldn’t do matrix .* vec, but had resort to padding vec with repeat, like Matlab in the bad old days (Matlab too broadcasts singleton dimensions these days, and I think numpy as well. In my mind, this is what ‘broadcasting’ means.)

In my opinion, broadcasting should apply scalars to each element of a container but should not conflate singleton containers with scalars.

How would you do broadcast multiply between a matrix and a vector, then? Use repeat?

Anyway, what are the drawbacks of this conflation? I can see only advantages.

Mixing containers with their values complicates reasoning and hides errors. This could be an error, for example:

julia> [10] .* [2, 3]
2-element Vector{Int64}:
 20
 30

I think you mean

julia> m = collect(reshape(1:6, 3,2))
3×2 Matrix{Int64}:
 1  4
 2  5
 3  6

julia> v = [1, 10, 100]
3-element Vector{Int64}:
   1
  10
 100

julia> m .* v
3×2 Matrix{Int64}:
   1    4
  20   50
 300  600

We can retain that without inconsistency. I like to think of matching up each dim from each argument until one runs out; semantically,

julia> mapslices(col -> map(*, col, v), m; dims=1)
3×2 Matrix{Int64}:
   1    4
  20   50
 300  600

I think part of this confusion stems from the fact that actually scalars don’t fit the view of “extruding dimensions of size 1”. In that sense I have no problem accepting that [1,2,3] .+ [1] works and more the fact that [1,2,3] .+ 1 also works.
However noone wants to wrap scalars in brackets and I think this problem is actually at least part of the reason that numbers are iterable in Julia because that means 5 also behaves as a container 5[1] == 5 and then broadcasting stays logically consistent.

In my view it’s the other way around: Numbers, Refs and 0-dimensional arrays are the ones that are behaving completely unobjectionably. They all have ndims(x) == 0 and axes(x) = (). The “proper” way to index them is with x[]; x[1] is a convenience. They fit the view of “expanding missing dimensions”. Data that has missing dimensions (or dimensions that are statically-enforced to be 1, like a transposed vector’s first dimension) will never unexpectedly change how it broadcasts. You might get caught out, however, if your matrix suddenly only has one row on some nth iteration.

There are several ways to do broadcasting or extending operations across arrays:

  1. Like Numpy or Julia with the following rules:

    i. Dimensions are matched from the front (Julia column-major) or back (Python row-major) and missing dimensions are filled:

    rand(2, 3) .+  1      # zero-dimensional, i.e., always matches
    rand(2, 3) .+ [1, 2]  # one-dimensional and matching first dim
    

    ii. Dimensions with single elements are expanded. This is particularly handy when used together with aggregations which don’t remove dimensions, e.g.,

    X = rand(2, 3, 4)
    sum(X; dims = 2) |> size
    X ./ sum(X; dims = 2)
    

    For consistency, this also implies that [1, 2, 3] .+ [4] works the same way.

  2. Like APL or J with the following rules:

    i. Each operation has a rank which specifies at which dimension of values it acts on, e.g., +, * all have rank zero and act on elements whereas aggregations such as +/ (sum) has infinite rank and acts on whole arrays by inserting + between its sub-arrays sliced along the first dimension (giving a result of lower dimension):

        X =. i. 2 3 4  NB. index array of size 2 x 3 x 4
        $ X  NB. $ means size
    2 3 4
        $ +/ X  NB. sum collapses first dimension
    3 4
    

    ii. When an operation has a lower rank than the array its applied to, the array is sliced into sub-arrays – starting from the front and expanding missing dimensions – until the rank of the operation is reached:

        X + 1 2  # Works as + has rank zero and first dimension matches
        X * 3     # Scalar is zero-dimensional
    

    Thus, this looks like broadcasting in Julia, but works for a different reason and one-dimensional dimensions are not expanded, e.g., X + i. 2 1 is a length error.

    iii. The rank of an operation can be modified using " making it possible to aggregate across other than the first dimension and correct for collapsed dimensions:

    (] %"1 +/)"2 X  NB. same as X / sum(X; dims = 2)
    
  3. Always using explicit and possible multiple mappings when applying scalar operations across arrays. Haskell with nested lists and map . map comes to mind or something like
    mapslices(v -> let s = sum(v); map(x -> x / s, v) end, X; dims = 2).
    This can get quite verbose quickly and I will not expand on this here …

All models are perfectly consistent with the first one being less general than the second one regarding composed operations applying at different sub-dimensions (ranks). It is probably easier to understand though and a good compromise to combat the verbosity of the third.