Tuple as argument for size(AbstractArray)

@DFN, I understand your concern about the number of questions (i.e. the tuple) corresponding to the number of answers (the result tuple). Yet my original concern was a unified way of being able to preserve the meaning of dimensions in your code. So if you like size_d(x,(2,3)) can do the job but it needs to return a size() tuple that corresponds to the full number of dimension that the array has.

But then shouldn’t the other dimensions have a different dummy value? 1 is an ordinary size and is ambiguous.

All dimensions which are in the original array and are not addressed in the tuple need to return 1. See my original implementation at the top. Only then can you directly continue to calculate with this “extracted” size.

Yes, I saw that. But I’m questioning whether 1 is a suitable dummy value. Maybe 0, -1 or nothing?

1 Like

I am wondering about this too.

@RainerHeintzmann: it would be great to see some context, ie what you are doing with this.

(incidentally, I am moving this to Usage).

1 Like

The use case is, that you can create an array having the same structure but the same size for the certain dimensions.

julia> function size_d(x::AbstractArray{T},dim::NTuple{N,Int}; keep_dims=true) where{T,N}
           if ~keep_dims
               return map(n->size(x,n),dim)
           end
           sz=ones(Int, ndims(x))
           for n in dim
               sz[n]=size(x,n) 
           end
           return Tuple(sz)
       end
size_d (generic function with 1 method)

julia> x = zeros((2,3,4));

julia> y = randn(size_d(x, (1, 3)))
2×1×4 Array{Float64, 3}:
[:, :, 1] =
 -1.2162382021871674
  0.1835508834781801

[:, :, 2] =
 -0.6412077930758956
 -1.4558078799673377

[:, :, 3] =
 0.8132901835276862
 1.0229190762259954

[:, :, 4] =
 -1.3311393812378058
 -0.02845997684223466

julia> x .+ y
2×3×4 Array{Float64, 3}:
[:, :, 1] =
 -1.21624   -1.21624   -1.21624
  0.183551   0.183551   0.183551

[:, :, 2] =
 -0.641208  -0.641208  -0.641208
 -1.45581   -1.45581   -1.45581

[:, :, 3] =
 0.81329  0.81329  0.81329
 1.02292  1.02292  1.02292

[:, :, 4] =
 -1.33114  -1.33114  -1.33114
 -0.02846  -0.02846  -0.02846

Of course, one easily do that for a 3D array by hand. But working with multidimensional array (where the dimension is unknown beforehand) makes that very tricky.

Or you can do like me, and never use Ref but instead wrap in a single-element tuple:

size.((x,), (2,3))
3 Likes

This looks like an XY problem: you have already decided that you want to do this, but again I am wondering why. Creating arrays like that for the fun of it may not be the ultimate answer.

There are people here who would genuinely like to help you, but it is not possible to do that without more context.

Since this discussion moved to Usage rather than basic language features, I will try to illustrate, why I think such a size_d function, by default putting singleton dimensions into unrequested size positions is very handy. Note the current question I am addressing here, is not how to implement this (after all there is a suggestion right at the top of this thread), but why this may be a useful addition.
It allows you to work with multiple dimensions like a “sculpturer”. Here is an example of cutting out a 2D torus sideways from a 3D block of data:

julia> using IndexFunArrays

julia> x = ones(5,5,5);

julia> x .* (rr(size_d(x,(2,3))) .< 2)
5×5×5 Array{Float64, 3}:
[:, :, 1] =
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0

[:, :, 2] =
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0

[:, :, 3] =
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0

[:, :, 4] =
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0
 0.0  1.0  1.0  1.0  0.0

[:, :, 5] =
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0

I am pretty sure that this can also be done in various other ways in Julia, but this ways is relatively agnostic to the total number of dimensions: It can easily be generalized to one more dimension (e.g. cutting a 3d toroidal volume from a 4d dataset) and its just a nice shorthand to have.
For those who worked with high-dimensional datasets (as mentioned: for example X,Y,Z,5 colors, fluorescence lifetime decay, time), you want to write functions that specifically deal with only some of these dimensions but are agnostic to the size of the other ones, whether they exist or just have a singleton size. Unexpected “squeeze” operations are a paint to deal with, which is why I (we all?) need handy shorthand methods that implement “keepdims=true” for subslicing (reductions already do this!) and also for the size operator. It is just a nice little useful shorthand not more and not less.

My humble opinion is that this is probably useful in some special cases, but maybe isn’t Base material.

I’m fully in favor of supporting tuple input for several dimensions, because it seems natural and consistent, (also with Matlab). But the keep_dims keyword seems a bit offbeat, special case, and hard to explain.

1 Like

I think size(x, (1,3)) returning a tuple like (2, 4) could fit nicely in Base (not that the broadcast is prohibitive, but it makes sense). A function that does what size_d with keep_dims = true does maybe could belong in LinearAlgebra or a tensor package? I agree the keyword argument is strange and it should simply have a different function name from size and always return in the (2, 1, 4) form. Not sure what that name should be. There is the issue of what to do with an input like size_d(x, (3,1)) since the output order is now less obvious and not consistent with size, but that’s not a blocker.

3 Likes

Since this usage seems very closely related to the IndexFunArrays.jl package and otherwise not very common in idiomatic Julia (actually, having used Julia for years now, I am not sure I ever needed anything like this), I think it would be better if you tried to convince that package to include it.

Usually, in order to make a case for including something into Base, it has to be very generally applicable. Personally I don’t think this qualifies, but if you disagree, just put it in a package and in the medium run you will be able make a convincing case based on its usage statistics. Occasionally, widely useful packages are indeed incorporated into the language (usually as standard libraries though, a recent addition is TOML).

1 Like

Actually we included it :smiley:

But with a different name selectsizes which might be more in similar style like selectdim.

4 Likes

I do agree that LinearAlgebra may also be a suitable place. Orgininally I thougth about size since this seemed a natural behaviour for me. Since I first looked whether size supports this behaviour, I figured that this is where others may also look. In contrast to Tamas, I do not think that this is a very special use case of IndexFunArrays.jl package. After all we needed it for writing this package rather than for using it. size_d or selectsizes or whatever name it will be is useful for all kind of N-dimensional data processing. The same holds for a version of selectdims, which does not delete singletons.

1 Like

I do think we need better idioms for aligning arrays for broadcasting, but I’m still not convinced it should be shoved into size. Some things we’ve spitballed in the past include potential operators like ⟂ (for orthogonal broadcasting) or other ways of getting indexing to participate more thoroughly in broadcast.

2 Likes

Then, again, a use case with broader context (and not the above package) would help.

There are now 35 posts in this topic, and I am still missing that. Please don’t get me wrong: I understand you would find this useful. I am just not convinced that it is broadly useful without seeing some use cases — again, not an example of using this function, but a generic problem where we do not have existing solutions.

The only other place where one encounters “squeezing” dimensions to 1 is reduce, dropdims, and friends, as in

reduce(max, ones(3, 3, 3), dims = (1, 3))

and that is not generally considered to be especially well-designed — see the related discussions in

Alternatives like

are much nicer IMO.

3 Likes

Yes, reduce operations are done the way I like it and I totally agree with @timholy that it is useful to keep the dimensions where they are as it simplifies applying the condensed/extracted data back to the original data as in trying to subtract the minimum in each slice: myarray./minimum(myarray,dims=(1,2)). If you write any ND algorithm it is often essential to put dimensions back where they are supposed to live. Unwilling squeeze operations are a nightmare. Luckily selectdim is decent enough to not perform any unwilling squeeze operations:

selectdim(ones(3,1,3),1,1)
1×3 view(::Array{Float64, 3}, 1, :, :) with eltype Float64:
 1.0  1.0  1.0

but it still squeezes out the selected dimension. But playing with it a bit: Wow, in agreement with the syntax of ordinary index selection you can actually also do the same for selectdim:

selectdim(ones(3,1,3),1,1:1)
1×1×3 view(::Array{Float64, 3}, 1:1, :, :) with eltype Float64:
[:, :, 1] =
 1.0

[:, :, 2] =
 1.0

[:, :, 3] =
 1.0

So, one problem solved :slight_smile: Yet what remains is a generally easy to use way to know the size of any such selection without having to reside to write your own function.

1 Like