A way to initialize an array without specifying its eltype in advance?

Hi

Often I would like to initialize an array without specifying an eltype in advance, even though I know all elements of the array will have the same concrete type. By way of example,

function makesomearray()
    arr = []
    for i in 1:10
        element = some_computation(i) #some type stable computation
        push!(arr, element)
    end
    return arr #Array{Any, 1}
end

where element can have a fairly nontrivial type that is hard to know in advance. Of course, the above code is not performant, since arr will have type Array{Any, 1}. In order to make it type stable, I often write something like this:

function makesomearray2()
    element = some_computation(1)
    arr = [element]
    for i in 2:10
        element = some_computation(i)
        push!(arr, element)
    end
    return arr #Array{eltype(element), 1}
end

which is ugly. What I much prefer of is some kind of “lazy empty array”, which does not fix its type until a first element is pushed into it, such that the following code, almost identical to makesomearray will return the correct type.

function makesomearray3()
    arr = LazyEmptyArray[]
    for i in 1:10
        element = some_computation(i)
        push!(arr, element) # the first invocation of this should tell the compiler what type arr should be
    end
    return arr #Array{eltype(element), 1}
end

My questions are

  1. Is there already some macro or package for handling this kind of logic?
  2. If not, is there any obstacle to implementing LazyEmptyArray?

Thank you!

if you know the type (not that you have to hard-code manually):

T[]

If you see an input array and like to have an empty array with same elements types:

eltype(input_array)[]

(also checkout useful functions like similar)

Thanks for your comment! However, those strategies do not work for the situation I have in mind. I am thinking of a scenario where there is no easy way to write down the type of element in advance, and the only practical way of knowing the type of element is via typeof(element). A example might be the solution type for DifferentialEquations.jl. If you run this tutorial code, the return type is something like

typeof(sol) = 
ODESolution{Float64,1,Array{Float64,1},Nothing,Nothing,Array{Float64,1},Array{Array{Float64,1},1},ODEProblem{Float64,Tuple{Float64,Float64},false,DiffEqBase.NullParameters,ODEFunction{false,typeof(f),LinearAlgebra.UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem},Tsit5,OrdinaryDiffEq.InterpolationData{ODEFunction{false,typeof(f),LinearAlgebra.UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Array{Float64,1},Array{Float64,1},Array{Array{Float64,1},1},OrdinaryDiffEq.Tsit5ConstantCache{Float64,Float64}},DiffEqBase.DEStats}

which is practically impossible to know in advance.

then just use eltype or tyoepf. But mind in these cases, you can probably just push to Any[]. I don’t see a huge complicated Array of Struct being super performant anyways. (and you’re probably not using this array with such needs)

You have a good point that Array of complicated Struct might not be required for performance.

You suggest to use eltype or typeof. Do you mind providing a code for doing it concisely? If I trye to use typeof(element) to initialize an empty array, , I need to first obtain element somehow, and the code I write up end up looking as ugly as makesomearray2().

It all depends on how your loop works. For example, I don’t see the point of repeatedly calling solve and push results somewhere. Notice, if you map(solve, array), the array probably will be typed

You could use a comprehension. Comprehensions determine the appropriate array type as elements are added.

4 Likes

Or map: it also produces a collection with an element type that contains what actually goes in the array. For example:

julia> a = [rand(Bool) ? rand(1:10) : randstring() for _ = 1:10]
10-element Vector{Any}:
 9
  "UUa9X3To"
 6
  "5NAPmkJw"
 9
 9
 7
 1
 2
  "aCmCF7vc"

julia> map(length, a)
10-element Vector{Int64}:
 1
 8
 1
 8
 1
 1
 1
 1
 1
 8
4 Likes

@benninkrs @StefanKarpinski Thank you for your reply! I did not think of making a clever use of map for this purpose. That seems to work well if what I want is a single array.

Sometimes, however, I’d like to create multiple arrays from a single for loop. I am trying to figure out what is the best idiomatic way of writing that in Julia. What I can think of is something like the following:

using StructArray
using Unpack

# pattern 1 : anonymous function
USVs =   map(1:L) do x
              # do something with x
              return (U=U, S=S, V=V)
end
@unpack Us, Ss, Vs = StructArray(USVs)

# pattern 2 : named function
# somefunc(i) returns a vanilla tuple (U, S, V)
Us, Ss, Vs = map(somefunc, 1:L) |> StructArray |> fieldarrays

While these are reasonably concise, It relies on StructArray. I am wondering if this is the best way to accomplish this, or if there is a more idiomatic way of doing this with Julia Base. Do you guys have any advice?

Currently, use StructArray.

The same widening strategy that collect uses in Base could be ported easily to functions returning a tuple and building up a tuple of arrays, but AFAIK so far no one has bothered to do it.

1 Like

@tkf has implemented a couple of these things. There’s Empty in BangBang.jl, which is pretty much the idea you described, and InitialValues.jl for a more general idea.

3 Likes

Some relevant discussion towards the end of Skipping parts of a for loop in the first iteration

1 Like

Here is a way to do this with Transducers.jl:

julia> using Transducers

julia> foldxl(
           ProductRF(push!!, push!!, push!!),
           ((Int(x), string(x), Symbol(x)) for x in 'a':'c'),
       )
([97, 98, 99], ["a", "b", "c"], [:a, :b, :c])

julia> typeof(ans)
Tuple{Vector{Int64}, Vector{String}, Vector{Symbol}}

See also Multiple outputs usage patterns in Transducers.jl


Of course, it is totally fine to do this with normal loop:

julia> a = b = c = Union{}[]
       for x in 'a':'c'
           a = push!!(a, Int(x))
           b = push!!(b, string(x))
           c = push!!(c, Symbol(x))
       end

julia> a, b, c
([97, 98, 99], ["a", "b", "c"], [:a, :b, :c])

julia> typeof(ans)
Tuple{Vector{Int64}, Vector{String}, Vector{Symbol}}

However, Julia’s for loop sometimes does not produce efficient code for type-changing accumulators as above. This is where FLoops.jl is useful:

julia> using FLoops

julia> @floop begin
           a = b = c = Union{}[]
           for x in 'a':'c'
               a = push!!(a, Int(x))
               b = push!!(b, string(x))
               c = push!!(c, Symbol(x))
           end
       end
4 Likes