Tuple as argument for size(AbstractArray)

RainerHeintzmann · March 23, 2021, 11:53am

@DFN, I understand your concern about the number of questions (i.e. the tuple) corresponding to the number of answers (the result tuple). Yet my original concern was a unified way of being able to preserve the meaning of dimensions in your code. So if you like size_d(x,(2,3)) can do the job but it needs to return a size() tuple that corresponds to the full number of dimension that the array has.

DNF · March 23, 2021, 12:15pm

But then shouldn’t the other dimensions have a different dummy value? 1 is an ordinary size and is ambiguous.

RainerHeintzmann · March 23, 2021, 1:05pm

All dimensions which are in the original array and are not addressed in the tuple need to return 1. See my original implementation at the top. Only then can you directly continue to calculate with this “extracted” size.

DNF · March 23, 2021, 1:55pm

Yes, I saw that. But I’m questioning whether 1 is a suitable dummy value. Maybe 0, -1 or nothing?

Tamas_Papp · March 23, 2021, 1:57pm

I am wondering about this too.

@RainerHeintzmann: it would be great to see some context, ie what you are doing with this.

(incidentally, I am moving this to Usage).

roflmaostc · March 23, 2021, 2:01pm

The use case is, that you can create an array having the same structure but the same size for the certain dimensions.

julia> function size_d(x::AbstractArray{T},dim::NTuple{N,Int}; keep_dims=true) where{T,N}
           if ~keep_dims
               return map(n->size(x,n),dim)
           end
           sz=ones(Int, ndims(x))
           for n in dim
               sz[n]=size(x,n) 
           end
           return Tuple(sz)
       end
size_d (generic function with 1 method)

julia> x = zeros((2,3,4));

julia> y = randn(size_d(x, (1, 3)))
2×1×4 Array{Float64, 3}:
[:, :, 1] =
 -1.2162382021871674
  0.1835508834781801

[:, :, 2] =
 -0.6412077930758956
 -1.4558078799673377

[:, :, 3] =
 0.8132901835276862
 1.0229190762259954

[:, :, 4] =
 -1.3311393812378058
 -0.02845997684223466

julia> x .+ y
2×3×4 Array{Float64, 3}:
[:, :, 1] =
 -1.21624   -1.21624   -1.21624
  0.183551   0.183551   0.183551

[:, :, 2] =
 -0.641208  -0.641208  -0.641208
 -1.45581   -1.45581   -1.45581

[:, :, 3] =
 0.81329  0.81329  0.81329
 1.02292  1.02292  1.02292

[:, :, 4] =
 -1.33114  -1.33114  -1.33114
 -0.02846  -0.02846  -0.02846

Of course, one easily do that for a 3D array by hand. But working with multidimensional array (where the dimension is unknown beforehand) makes that very tricky.

Henrique_Becker · March 23, 2021, 3:58pm

Or you can do like me, and never use Ref but instead wrap in a single-element tuple:

size.((x,), (2,3))

Tamas_Papp · March 24, 2021, 6:43am

This looks like an XY problem: you have already decided that you want to do this, but again I am wondering why. Creating arrays like that for the fun of it may not be the ultimate answer.

There are people here who would genuinely like to help you, but it is not possible to do that without more context.

RainerHeintzmann · March 24, 2021, 8:36am

Since this discussion moved to Usage rather than basic language features, I will try to illustrate, why I think such a size_d function, by default putting singleton dimensions into unrequested size positions is very handy. Note the current question I am addressing here, is not how to implement this (after all there is a suggestion right at the top of this thread), but why this may be a useful addition.
It allows you to work with multiple dimensions like a “sculpturer”. Here is an example of cutting out a 2D torus sideways from a 3D block of data:

julia> using IndexFunArrays

julia> x = ones(5,5,5);

julia> x .* (rr(size_d(x,(2,3))) .< 2)
5×5×5 Array{Float64, 3}:
[:, :, 1] =
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0

[:, :, 2] =
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0

[:, :, 3] =
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0

[:, :, 4] =
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0

[:, :, 5] =
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0

I am pretty sure that this can also be done in various other ways in Julia, but this ways is relatively agnostic to the total number of dimensions: It can easily be generalized to one more dimension (e.g. cutting a 3d toroidal volume from a 4d dataset) and its just a nice shorthand to have.
For those who worked with high-dimensional datasets (as mentioned: for example X,Y,Z,5 colors, fluorescence lifetime decay, time), you want to write functions that specifically deal with only some of these dimensions but are agnostic to the size of the other ones, whether they exist or just have a singleton size. Unexpected “squeeze” operations are a paint to deal with, which is why I (we all?) need handy shorthand methods that implement “keepdims=true” for subslicing (reductions already do this!) and also for the size operator. It is just a nice little useful shorthand not more and not less.

DNF · March 24, 2021, 10:29am

My humble opinion is that this is probably useful in some special cases, but maybe isn’t Base material.

I’m fully in favor of supporting tuple input for several dimensions, because it seems natural and consistent, (also with Matlab). But the keep_dims keyword seems a bit offbeat, special case, and hard to explain.

tomerarnon · March 24, 2021, 11:59am

I think size(x, (1,3)) returning a tuple like (2, 4) could fit nicely in Base (not that the broadcast is prohibitive, but it makes sense). A function that does what size_d with keep_dims = true does maybe could belong in LinearAlgebra or a tensor package? I agree the keyword argument is strange and it should simply have a different function name from size and always return in the (2, 1, 4) form. Not sure what that name should be. There is the issue of what to do with an input like size_d(x, (3,1)) since the output order is now less obvious and not consistent with size, but that’s not a blocker.

Tamas_Papp · March 24, 2021, 12:32pm

Since this usage seems very closely related to the IndexFunArrays.jl package and otherwise not very common in idiomatic Julia (actually, having used Julia for years now, I am not sure I ever needed anything like this), I think it would be better if you tried to convince that package to include it.

Usually, in order to make a case for including something into Base, it has to be very generally applicable. Personally I don’t think this qualifies, but if you disagree, just put it in a package and in the medium run you will be able make a convincing case based on its usage statistics. Occasionally, widely useful packages are indeed incorporated into the language (usually as standard libraries though, a recent addition is TOML).

roflmaostc · March 24, 2021, 12:41pm

Actually we included it

But with a different name selectsizes which might be more in similar style like selectdim.

RainerHeintzmann · March 24, 2021, 1:28pm

I do agree that LinearAlgebra may also be a suitable place. Orgininally I thougth about size since this seemed a natural behaviour for me. Since I first looked whether size supports this behaviour, I figured that this is where others may also look. In contrast to Tamas, I do not think that this is a very special use case of IndexFunArrays.jl package. After all we needed it for writing this package rather than for using it. size_d or selectsizes or whatever name it will be is useful for all kind of N-dimensional data processing. The same holds for a version of selectdims, which does not delete singletons.

mbauman · March 24, 2021, 1:33pm

I do think we need better idioms for aligning arrays for broadcasting, but I’m still not convinced it should be shoved into size. Some things we’ve spitballed in the past include potential operators like ⟂ (for orthogonal broadcasting) or other ways of getting indexing to participate more thoroughly in broadcast.

Tamas_Papp · March 24, 2021, 2:40pm

Then, again, a use case with broader context (and not the above package) would help.

There are now 35 posts in this topic, and I am still missing that. Please don’t get me wrong: I understand you would find this useful. I am just not convinced that it is broadly useful without seeing some use cases — again, not an example of using this function, but a generic problem where we do not have existing solutions.

The only other place where one encounters “squeezing” dimensions to 1 is reduce, dropdims, and friends, as in

reduce(max, ones(3, 3, 3), dims = (1, 3))

and that is not generally considered to be especially well-designed — see the related discussions in

github.com/JuliaLang/julia

array reductions (sum, mean, etc.) and dropping dimensions

opened 12:46AM - 27 May 16 UTC

ak168421

breaking arrays

Following this discussion on julia-users: https://groups.google.com/forum/#!topi…c/julia-users/V-meqUZtb7k I find the fact that `sum(A,1)` and `A[1,:]` have different shapes in 0.5 counter intuitive. I am not aware of other languages for which the equivalent expressions have different shapes. It leads to unexpected results when computing e.g. `A[1,:] ./ sum(A,1)`. At least in the case of `sum`, it seems like `sum(A,1)` should be equivalent to `A[1,:] + A[2,:] + A[3,:] + ...` However, as @timholy pointed out, this behavior can be useful for broadcasting with expressions like `A ./ sum(A,1)`. For me, having the default slicing and array reduction behavior be consistent seems the natural choice, meaning that `sum(A,1)` should drop the first dimension. But I understand the attractiveness of not dropping dimensions for broadcasting. What are others' thoughts?

Alternatives like

are much nicer IMO.

RainerHeintzmann · March 26, 2021, 7:52am

Yes, reduce operations are done the way I like it and I totally agree with @timholy that it is useful to keep the dimensions where they are as it simplifies applying the condensed/extracted data back to the original data as in trying to subtract the minimum in each slice: myarray./minimum(myarray,dims=(1,2)). If you write any ND algorithm it is often essential to put dimensions back where they are supposed to live. Unwilling squeeze operations are a nightmare. Luckily selectdim is decent enough to not perform any unwilling squeeze operations:

selectdim(ones(3,1,3),1,1)
1×3 view(::Array{Float64, 3}, 1, :, :) with eltype Float64:
 1.0  1.0  1.0

but it still squeezes out the selected dimension. But playing with it a bit: Wow, in agreement with the syntax of ordinary index selection you can actually also do the same for selectdim:

selectdim(ones(3,1,3),1,1:1)
1×1×3 view(::Array{Float64, 3}, 1:1, :, :) with eltype Float64:
[:, :, 1] =
 1.0

[:, :, 2] =
 1.0

[:, :, 3] =
 1.0

So, one problem solved Yet what remains is a generally easy to use way to know the size of any such selection without having to reside to write your own function.

Topic		Replies	Views
The best way to create a AbstractArray with known first demension size New to Julia	1	189	September 28, 2022
Because size() doesn't work General Usage question	6	1852	April 21, 2020
`length.(AbstractArray[])` and `size.(AbstractArray[])` return empty arrays of `Any` General Usage	19	527	March 11, 2022
Incomplete tuples General Usage	5	686	October 4, 2017
Why size(vector, 2) return 1? General Usage question	2	299	October 23, 2022

Tuple as argument for size(AbstractArray)

Related topics