Strange behaviors in python and numpy

Numpy just assumes that you want to do broadcasting here, so it just starts adding stuff to your numpy array.

In [1]: import numpy as np

In [2]: a = np.array([1,2,3])

In [3]: sum(a,a)
Out[3]: array([7, 8, 9])

In [4]: ?sum
Signature: sum(iterable, /, start=0)
Docstring:
Return the sum of a 'start' value (default: 0) plus an iterable of numbers

When the iterable is empty, return the start value.
This function is intended specifically for use with numeric values and may
reject non-numeric types.
Type:      builtin_function_or_method

In [5]: a + 1 + 2 + 3
Out[5]: array([7, 8, 9])

In [6]: a + sum(a)
Out[6]: array([7, 8, 9])
5 Likes

can we start a sub-thread about Numpy?

Nobody ever said Numpy is too buggy

7 Likes

Are you saying numpy might have correctness issues?

1 Like

That’s not correctness issues. In a in b in Numpy means what any(a .== b) means in Julia. It’s a bad definition to be sure, but not really a correctness problem.

1 Like

except it also has this behavior

In [2]: np.array([1,2,3]) == np.array([1,2])
ValueError: operands could not be broadcast together with shapes (3,) (2,)

In [3]: np.array([1,2,3]) == np.array([2])
Out[3]: array([False,  True, False])

In [12]: np.array([1,2,3]) in np.array([2])
Out[12]: True
2 Likes

No idea whether this is correct or incorrect but it did catch me by surprise when I was porting some Python code to Julia:

>>> import numpy as np
>>> x = np.float32(0)
>>> (x ** 2).dtype
dtype('float64')

@jling That behaviour is completely consistent. Remember that in NumPy, a == b is the same as Julia’s a .== b, and a in b is the same as Julia’s any(a .== b).

So, we get the exact same behaviour in Julia:

julia> [1,2,3] .== [1, 2]
ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 3 and 2
[ ... ]

julia> [1,2,3] .== [2]
3-element BitVector:
 0
 1
 0

julia> any([1,2,3] .== [2])
true
1 Like

oh wow I actually hate this.

3 Likes

Like NumPy, we broadcast arrays with length 1 across a given dimension. This is useful, because we can do stuff like

julia> x = rand(4, 3);

julia> x_normed = x ./ sum(x; dims=2)
4×3 Matrix{Float64}:
 0.0099922  0.907046   0.0829616
 0.628005   0.0108374  0.361158
 0.568802   0.249942   0.181256
 0.361719   0.220452   0.417829

Which works, even though sum returns a 4 x 1 matrix.

1 Like

Isn’t this basically the whole point of broadcasting?

1 Like

I would have thought in general

x != [x] != Ref(x)

for broadcasting, but idk, maybe I just don’t have a basic understanding of the point of broadcasting

Yeah, this is what I call “implicit” broadcasting — dimensions that happen to have length one also “extrude out” (or repeat) to match other arguments exactly like missing dimensions do. I like this visualization.

1 Like

It would be a pity if we couldn’t do matrix .* vec, but had resort to padding vec with repeat, like Matlab in the bad old days (Matlab too broadcasts singleton dimensions these days, and I think numpy as well. In my mind, this is what ‘broadcasting’ means.)

1 Like

image

In my opinion, broadcasting should apply scalars to each element of a container but should not conflate singleton containers with scalars.

How would you do broadcast multiply between a matrix and a vector, then? Use repeat?

Anyway, what are the drawbacks of this conflation? I can see only advantages.

Mixing containers with their values complicates reasoning and hides errors. This could be an error, for example:

julia> [10] .* [2, 3]
2-element Vector{Int64}:
 20
 30

I think you mean

julia> m = collect(reshape(1:6, 3,2))
3×2 Matrix{Int64}:
 1  4
 2  5
 3  6

julia> v = [1, 10, 100]
3-element Vector{Int64}:
   1
  10
 100

julia> m .* v
3×2 Matrix{Int64}:
   1    4
  20   50
 300  600

We can retain that without inconsistency. I like to think of matching up each dim from each argument until one runs out; semantically,

julia> mapslices(col -> map(*, col, v), m; dims=1)
3×2 Matrix{Int64}:
   1    4
  20   50
 300  600

I think part of this confusion stems from the fact that actually scalars don’t fit the view of “extruding dimensions of size 1”. In that sense I have no problem accepting that [1,2,3] .+ [1] works and more the fact that [1,2,3] .+ 1 also works.
However noone wants to wrap scalars in brackets and I think this problem is actually at least part of the reason that numbers are iterable in Julia because that means 5 also behaves as a container 5[1] == 5 and then broadcasting stays logically consistent.

In my view it’s the other way around: Numbers, Refs and 0-dimensional arrays are the ones that are behaving completely unobjectionably. They all have ndims(x) == 0 and axes(x) = (). The “proper” way to index them is with x[]; x[1] is a convenience. They fit the view of “expanding missing dimensions”. Data that has missing dimensions (or dimensions that are statically-enforced to be 1, like a transposed vector’s first dimension) will never unexpectedly change how it broadcasts. You might get caught out, however, if your matrix suddenly only has one row on some nth iteration.

2 Likes

There are several ways to do broadcasting or extending operations across arrays:

  1. Like Numpy or Julia with the following rules:

    i. Dimensions are matched from the front (Julia column-major) or back (Python row-major) and missing dimensions are filled:

    rand(2, 3) .+  1      # zero-dimensional, i.e., always matches
    rand(2, 3) .+ [1, 2]  # one-dimensional and matching first dim
    

    ii. Dimensions with single elements are expanded. This is particularly handy when used together with aggregations which don’t remove dimensions, e.g.,

    X = rand(2, 3, 4)
    sum(X; dims = 2) |> size
    X ./ sum(X; dims = 2)
    

    For consistency, this also implies that [1, 2, 3] .+ [4] works the same way.

  2. Like APL or J with the following rules:

    i. Each operation has a rank which specifies at which dimension of values it acts on, e.g., +, * all have rank zero and act on elements whereas aggregations such as +/ (sum) has infinite rank and acts on whole arrays by inserting + between its sub-arrays sliced along the first dimension (giving a result of lower dimension):

        X =. i. 2 3 4  NB. index array of size 2 x 3 x 4
        $ X  NB. $ means size
    2 3 4
        $ +/ X  NB. sum collapses first dimension
    3 4
    

    ii. When an operation has a lower rank than the array its applied to, the array is sliced into sub-arrays – starting from the front and expanding missing dimensions – until the rank of the operation is reached:

        X + 1 2  # Works as + has rank zero and first dimension matches
        X * 3     # Scalar is zero-dimensional
    

    Thus, this looks like broadcasting in Julia, but works for a different reason and one-dimensional dimensions are not expanded, e.g., X + i. 2 1 is a length error.

    iii. The rank of an operation can be modified using " making it possible to aggregate across other than the first dimension and correct for collapsed dimensions:

    (] %"1 +/)"2 X  NB. same as X / sum(X; dims = 2)
    
  3. Always using explicit and possible multiple mappings when applying scalar operations across arrays. Haskell with nested lists and map . map comes to mind or something like
    mapslices(v -> let s = sum(v); map(x -> x / s, v) end, X; dims = 2).
    This can get quite verbose quickly and I will not expand on this here …

All models are perfectly consistent with the first one being less general than the second one regarding composed operations applying at different sub-dimensions (ranks). It is probably easier to understand though and a good compromise to combat the verbosity of the third.

2 Likes