Generator as an "Array Initializer?"

(inspired by this discussion)

Would it make sense for the Array constructor to accept a generator as an “array initializer?”

Array{Float64}(i+j for i=1:m, j=1:n)

The downside is that it’s redundant with the :typed_comprehension expression, e.g. Float64[...], but the upside is that it’s consistent with other constructors such as Tuple, and it provides an idiom for non-standard arrays to use generators (while avoiding the undef initialization with for loop idiom), such as:

using CUDA
CuArray{Float64}(i+j for i=1:m, j=1:n)

or even:

using OffsetArrays
ℱ = OffsetArray(exp(-2π*im*m*n/N) for m=0:N-1, n=0:N-1)

and I would presume OffsetArray would set up the indexing as needed by accessing generator.iter.iterators.

Presently some packages offer helper macros instead, such as:

using StaticArrays
@SArray[x for x=1:5]

but this feels rather unidiomatic, and is visually something of an oddity.

Thoughts?

8 Likes

I agree! When I first learned about the Array constructor, I found it very confusing that it couldn’t initialize the array with arbitrary values. Accepting a generator would be nice.

1 Like

This sounds like great idea.

It is really just a special case of T(x) turning x into a T, so the fact that it is not already implemented makes me think that it was probably discussed long ago, and there was some good reason not to do it. (But I cannot think of any such reason myself.)

1 Like

This is already close to the collect method.

julia> m = 10; n = 5
5

julia> collect(Float64, i+j for i in 1:m, j in 1:n)
10×5 Matrix{Float64}:
  2.0   3.0   4.0   5.0   6.0
  3.0   4.0   5.0   6.0   7.0
  4.0   5.0   6.0   7.0   8.0
  5.0   6.0   7.0   8.0   9.0
  6.0   7.0   8.0   9.0  10.0
  7.0   8.0   9.0  10.0  11.0
  8.0   9.0  10.0  11.0  12.0
  9.0  10.0  11.0  12.0  13.0
 10.0  11.0  12.0  13.0  14.0
 11.0  12.0  13.0  14.0  15.0

To avoid type piracy, we might try the following wrapper generator approach.

julia> struct GeneratorInit{T <: Base.Generator}
           gen::T
       end

julia> geninit(gen) = GeneratorInit(gen)
geninit (generic function with 1 method)

julia> function (::Type{A})(init::GeneratorInit) where {T, A <: AbstractArray{T}}
           A(collect(T, init.gen))
       end

julia> Array{Float64}(geninit(i+j for i in 1:5, j in 1:6))
5×6 Matrix{Float64}:
 2.0  3.0  4.0  5.0   6.0   7.0
 3.0  4.0  5.0  6.0   7.0   8.0
 4.0  5.0  6.0  7.0   8.0   9.0
 5.0  6.0  7.0  8.0   9.0  10.0
 6.0  7.0  8.0  9.0  10.0  11.0

Perhaps a function that accepts a CartesianIndex may be useful as well?

julia> function Array(f::Function, dims...)
           ci = CartesianIndices(dims)
           map(f, ci)
       end
Array

julia> Array(5, 6) do ci
           ci[1] + ci[2]
       end
5×6 Matrix{Int64}:
 2  3  4  5   6   7
 3  4  5  6   7   8
 4  5  6  7   8   9
 5  6  7  8   9  10
 6  7  8  9  10  11

julia> function Array{T}(f::Function, dims...) where T
           ci = CartesianIndices(dims)
           map(ci) do i
              T(f(ci))
           end
       end

julia> Array{Float64}(5, 6) do ci
           ci[1] + ci[2]
       end
5×6 Matrix{Float64}:
 2.0  3.0  4.0  5.0   6.0   7.0
 3.0  4.0  5.0  6.0   7.0   8.0
 4.0  5.0  6.0  7.0   8.0   9.0
 5.0  6.0  7.0  8.0   9.0  10.0
 6.0  7.0  8.0  9.0  10.0  11.0

I added generator initializers to ArrayInitializers#geninit (geninit git branch of ArrayInitializers.jl).

julia> using Pkg

julia> pkg"add ArrayInitializers#geninit"
...
julia> using ArrayInitializers

julia> Array(geninit(x^2 for x in 2:5))
4-element Vector{Int64}:
  4
  9
 16
 25

julia> Array{Float64}(geninit(x^2 for x in 2:5))
4-element Vector{Float64}:
  4.0
  9.0
 16.0
 25.0

julia> Array{Float64}(geninit(x^2+y for x in 2:5, y in 1:10))
4×10 Matrix{Float64}:
  5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0
 10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0
 17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0  26.0
 26.0  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0  35.0

julia> using OffsetArrays

julia> OffsetArray(geninit(x^2 for x in 1:2:9), 6:10)
5-element OffsetArray(::Vector{Int64}, 6:10) with eltype Int64 with indices 6:10:
  1
  9
 25
 49
 81

julia> OffsetArray{UInt8}(geninit(x^2 for x in 1:2:9), 6:10)
5-element OffsetArray(::Vector{UInt8}, 6:10) with eltype UInt8 with indices 6:10:
 0x01
 0x09
 0x19
 0x31
 0x51
1 Like

You’re right, this idea is too obvious; there has to be something wrong with it :sweat_smile:

You move quick! My hesitation is that:

  1. this feels more like it should be a language feature instead of being in a package, and
  2. doesn’t this implementation allocate and fill an entire array before converting to the desired type, and therefore can be needlessly expensive?

On point (2), what inspired this idea to begin with was whether we could help avoid the undef allocation + for loop initialization idiom for non-standard arrays, but often why people might do that to begin with is because they want to avoid allocating unnecessary arrays. If we’re okay with unnecessary allocation, we can already call a constructor on a comprehension e.g. SMatrix{m,n}([i+j for i=1:m, j=1:n])

Hm, that’s an interesting idea! why shouldn’t user-defined functions be array initializers?

Although I’d probably prefer, instead of accepting a CartesianIndices object, to allow for a function signature that allows Array(5, 6) do (i,j); i+j end.

  1. Language features are really hard. This is really more of a standard API issue though, which is also very difficult to get right. As I explained before, it’s better to work on the design in a package first before trying to propose the pull request.
  2. It does, but the composition of ArrayInitializers and StaticArrays has not been worked on. I think I might make my first “glue” package to cover this.

I would provide the CartesianIndex. The CartesianIndex is just a wrapped tuple:

julia> CartesianIndex(2,3).I
(2, 3)
1 Like

The fact that the element type is known in advance makes this initialiser much simpler than the collect function. In principle, all that is needed for it to work in Base is:

function (::Type{A})(itr::Base.Generator) where {A<:AbstractArray} 
    dest = A(undef, size(itr))
    copyto!(dest, itr)
end

One might want to add some error handling (telling the user to revert to collect) instead of just throwing a MethodError when the generator doesn’t have a size, but otherwise I really don’t think it has to be more complicated than this.

2 Likes

We could use applicable and hasmethod to detect if size is applicable to the iterator within the generator.

julia> g = (x for x in 1:5)
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:5)

julia> applicable(size, g.iter)
true

julia> gc = (x for x in Iterators.take(Iterators.cycle(1:5),9))
Base.Generator{Base.Iterators.Take{Base.Iterators.Cycle{UnitRange{Int64}}}, typeof(identity)}(identity, Base.Iterators.Take{Base.Iterators.Cycle{UnitRange{Int64}}}(Base.Iterators.Cycle{UnitRange{Int64}}(1:5), 9))

julia> applicable(size, gc.iter)
false

1 Like

There’s also Base.IteratorSize which seems to be there for exactly this purpose. So one could do:

function (::Type{A})(itr::Base.Generator) where {A<:AbstractArray}
    dest = _array4itr(A, itr, Base.IteratorSize(itr))
    copyto!(dest, itr)
end

_array4itr(::Type{A}, itr, ::Base.HasShape) where {A} = A(undef, size(itr))
_array4itr(::Type{A}, itr, ::Base.HasLength) where {A} = A(undef, length(itr))
_array4itr(::Type{A}, itr, ::Base.SizeUnknown) where {A} = error("Cannot initialize an array for an iterator of unknown size.")

In that case, why not just use the tuple? At least the tuple is convenient to destructure. Do we get any dispatch benefits here?

With a CartesianIndex you can index directly into a multidimensional array. You cannot do that with a pure tuple.

julia> A = rand(5,3)
5×3 Matrix{Float64}:
 0.451517   0.152694  0.454544
 0.0387113  0.987212  0.294335
 0.369965   0.567338  0.797788
 0.661118   0.422781  0.601305
 0.984192   0.611997  0.794501

julia> I = CartesianIndex(5,3)
CartesianIndex(5, 3)

julia> A[(5,3)]
ERROR: ArgumentError: invalid index: (5, 3) of type Tuple{Int64, Int64}
...

julia> A[I]
0.794500894925653

Retrieving the tuple from the CartesianIndex is pretty easy:

julia> Tuple(I)
(5, 3)

julia> I.I
(5, 3)

julia> x, y = I.I
(5, 3)

julia> x
5

julia> y
3

If I want to index directly into an existing multidimensional array though, wouldn’t I just do this?

Array(m, n) do (i, j);  other_arr[i, j]  end

If CartesianIndex was iterable this wouldn’t be a concern, but it’s not so I’d rather avoid it.

Then again, looking at PR#23719—which explains the rationale for preventing iteration as avoiding a performance trap—maybe it’s been fixed since then.

Benchmark showing CartesianIndex iteration looks ok now
julia> Base.iterate(itr::CartesianIndex, t=itr.I) = Iterators.peel(t)

julia> function mysum1(A)
           s = 0.0
           @inbounds for i in CartesianIndices(A)
               s += A[i]
           end
           s
       end
mysum1 (generic function with 1 method)

julia> function mysum2(A)
           s = 0.0
           @inbounds for i in CartesianIndices(A)
               s += A[i...]
           end
           s
       end
mysum2 (generic function with 1 method)

julia> A = rand(10,10,10,10,10,10);

# after warmup
julia> @time mysum1(A)
  0.001131 seconds (1 allocation: 16 bytes)
499454.20988074585

julia> @time mysum2(A)
  0.001158 seconds (1 allocation: 16 bytes)
499454.20988074585

julia> @btime mysum1($A)
  514.000 μs (0 allocations: 0 bytes)
499454.20988074585

julia> @btime mysum2($A)
  514.000 μs (0 allocations: 0 bytes)
499454.20988074585

Oh, this looks like a good idea. PR#48404

This may be used to create an OffsetArray as well, by providing offset axes as the loop indices.

julia> using OffsetArrays

julia> collect(Float64, i+j for i in 1:3, j in Base.IdentityUnitRange(2:4))
3×3 OffsetArray(::Matrix{Float64}, 1:3, 2:4) with eltype Float64 with indices 1:3×2:4:
 3.0  4.0  5.0
 4.0  5.0  6.0
 5.0  6.0  7.0
1 Like