Would it make sense for the Array constructor to accept a generator as an “array initializer?”
Array{Float64}(i+j for i=1:m, j=1:n)
The downside is that it’s redundant with the :typed_comprehension expression, e.g. Float64[...], but the upside is that it’s consistent with other constructors such as Tuple, and it provides an idiom for non-standard arrays to use generators (while avoiding the undef initialization with for loop idiom), such as:
using CUDA
CuArray{Float64}(i+j for i=1:m, j=1:n)
or even:
using OffsetArrays
ℱ = OffsetArray(exp(-2π*im*m*n/N) for m=0:N-1, n=0:N-1)
and I would presume OffsetArray would set up the indexing as needed by accessing generator.iter.iterators.
Presently some packages offer helper macros instead, such as:
using StaticArrays
@SArray[x for x=1:5]
but this feels rather unidiomatic, and is visually something of an oddity.
I agree! When I first learned about the Array constructor, I found it very confusing that it couldn’t initialize the array with arbitrary values. Accepting a generator would be nice.
It is really just a special case of T(x) turning x into a T, so the fact that it is not already implemented makes me think that it was probably discussed long ago, and there was some good reason not to do it. (But I cannot think of any such reason myself.)
I added generator initializers to ArrayInitializers#geninit (geninit git branch of ArrayInitializers.jl).
julia> using Pkg
julia> pkg"add ArrayInitializers#geninit"
...
julia> using ArrayInitializers
julia> Array(geninit(x^2 for x in 2:5))
4-element Vector{Int64}:
4
9
16
25
julia> Array{Float64}(geninit(x^2 for x in 2:5))
4-element Vector{Float64}:
4.0
9.0
16.0
25.0
julia> Array{Float64}(geninit(x^2+y for x in 2:5, y in 1:10))
4×10 Matrix{Float64}:
5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0
10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0
17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0
26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 35.0
julia> using OffsetArrays
julia> OffsetArray(geninit(x^2 for x in 1:2:9), 6:10)
5-element OffsetArray(::Vector{Int64}, 6:10) with eltype Int64 with indices 6:10:
1
9
25
49
81
julia> OffsetArray{UInt8}(geninit(x^2 for x in 1:2:9), 6:10)
5-element OffsetArray(::Vector{UInt8}, 6:10) with eltype UInt8 with indices 6:10:
0x01
0x09
0x19
0x31
0x51
You’re right, this idea is too obvious; there has to be something wrong with it
You move quick! My hesitation is that:
this feels more like it should be a language feature instead of being in a package, and
doesn’t this implementation allocate and fill an entire array before converting to the desired type, and therefore can be needlessly expensive?
On point (2), what inspired this idea to begin with was whether we could help avoid the undef allocation + for loop initialization idiom for non-standard arrays, but often why people might do that to begin with is because they want to avoid allocating unnecessary arrays. If we’re okay with unnecessary allocation, we can already call a constructor on a comprehension e.g. SMatrix{m,n}([i+j for i=1:m, j=1:n])
Hm, that’s an interesting idea! why shouldn’t user-defined functions be array initializers?
Although I’d probably prefer, instead of accepting a CartesianIndices object, to allow for a function signature that allows Array(5, 6) do (i,j); i+j end.
Language features are really hard. This is really more of a standard API issue though, which is also very difficult to get right. As I explained before, it’s better to work on the design in a package first before trying to propose the pull request.
It does, but the composition of ArrayInitializers and StaticArrays has not been worked on. I think I might make my first “glue” package to cover this.
I would provide the CartesianIndex. The CartesianIndex is just a wrapped tuple:
The fact that the element type is known in advance makes this initialiser much simpler than the collect function. In principle, all that is needed for it to work in Base is:
function (::Type{A})(itr::Base.Generator) where {A<:AbstractArray}
dest = A(undef, size(itr))
copyto!(dest, itr)
end
One might want to add some error handling (telling the user to revert to collect) instead of just throwing a MethodError when the generator doesn’t have a size, but otherwise I really don’t think it has to be more complicated than this.
We could use applicable and hasmethod to detect if size is applicable to the iterator within the generator.
julia> g = (x for x in 1:5)
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:5)
julia> applicable(size, g.iter)
true
julia> gc = (x for x in Iterators.take(Iterators.cycle(1:5),9))
Base.Generator{Base.Iterators.Take{Base.Iterators.Cycle{UnitRange{Int64}}}, typeof(identity)}(identity, Base.Iterators.Take{Base.Iterators.Cycle{UnitRange{Int64}}}(Base.Iterators.Cycle{UnitRange{Int64}}(1:5), 9))
julia> applicable(size, gc.iter)
false
There’s also Base.IteratorSize which seems to be there for exactly this purpose. So one could do:
function (::Type{A})(itr::Base.Generator) where {A<:AbstractArray}
dest = _array4itr(A, itr, Base.IteratorSize(itr))
copyto!(dest, itr)
end
_array4itr(::Type{A}, itr, ::Base.HasShape) where {A} = A(undef, size(itr))
_array4itr(::Type{A}, itr, ::Base.HasLength) where {A} = A(undef, length(itr))
_array4itr(::Type{A}, itr, ::Base.SizeUnknown) where {A} = error("Cannot initialize an array for an iterator of unknown size.")
If I want to index directly into an existing multidimensional array though, wouldn’t I just do this?
Array(m, n) do (i, j); other_arr[i, j] end
If CartesianIndex was iterable this wouldn’t be a concern, but it’s not so I’d rather avoid it.
Then again, looking at PR#23719—which explains the rationale for preventing iteration as avoiding a performance trap—maybe it’s been fixed since then.
Benchmark showing CartesianIndex iteration looks ok now
julia> Base.iterate(itr::CartesianIndex, t=itr.I) = Iterators.peel(t)
julia> function mysum1(A)
s = 0.0
@inbounds for i in CartesianIndices(A)
s += A[i]
end
s
end
mysum1 (generic function with 1 method)
julia> function mysum2(A)
s = 0.0
@inbounds for i in CartesianIndices(A)
s += A[i...]
end
s
end
mysum2 (generic function with 1 method)
julia> A = rand(10,10,10,10,10,10);
# after warmup
julia> @time mysum1(A)
0.001131 seconds (1 allocation: 16 bytes)
499454.20988074585
julia> @time mysum2(A)
0.001158 seconds (1 allocation: 16 bytes)
499454.20988074585
julia> @btime mysum1($A)
514.000 μs (0 allocations: 0 bytes)
499454.20988074585
julia> @btime mysum2($A)
514.000 μs (0 allocations: 0 bytes)
499454.20988074585
This may be used to create an OffsetArray as well, by providing offset axes as the loop indices.
julia> using OffsetArrays
julia> collect(Float64, i+j for i in 1:3, j in Base.IdentityUnitRange(2:4))
3×3 OffsetArray(::Matrix{Float64}, 1:3, 2:4) with eltype Float64 with indices 1:3×2:4:
3.0 4.0 5.0
4.0 5.0 6.0
5.0 6.0 7.0