Type-stable Array of NamedTuples

I often want to do the following for a large n where I perform some calculations and return a tuple of values with the result of each iteration:

function f(n::Int)
  v = Vector()
  for i in 1:n 
    ...
    push!(v, (a=i, b=1.23, c=false, d=:foo, ...))
  end
  v
end

As written, the code is less efficient than it could be because v will have type Array{Any,1}. The type instability can be fixed by explicitly specifying the type of the NamedTuple, but that is cumbersome as the tuple’s size grows.

Is there any way to tell Julia that I only want to push a single type to the Array so that it can either

  • automatically infer the type from the push!, or
  • wait to initiate the Array until the first push! call so that the type will be known at that time?
1 Like

Check out

https://github.com/JuliaComputing/IndexedTables.jl

IndexedTables provide the backend for JuliaDB which you can also check out

https://github.com/JuliaComputing/JuliaDB.jl

There are some type-instabilities you can’t fully resolve. That’s fine, just put that piece of code inside a function-barrier. See below

https://docs.julialang.org/en/v1/manual/performance-tips/#kernel-functions-1

1 Like

Maybe push!! from BangBang.jl could help? Not sure though.

1 Like

You can use typeof to get a type of a variable and construct vector of that type:

function foo(n::Int)
  v = nothing
  for i in 1:n 
    if v === nothing
        v = Vector{typeof((a=i, b=1.23, c=false, d=:foo))}(undef, n)
    end
    # push!(v, (a=i, b=1.23, c=false, d=:foo))
    v[i] = (a=i, b=1.23, c=false, d=:foo)
  end
  v
end

But better if you know variable type at initialization.

You can use StructArrays.jl if you need to work on vectors of individual fields.

There is also map syntax:

map(v -> f(v), 1:n)

and list comprehension:
[f(v) for v in 1:n]

2 Likes

See this discussion Ridiculous idea: types from the future.

1 Like

Does this scale if the tuple size grows?

Do you mean that:
a) Tuple can be of different size between iterations inside one function call, or
b) Tuple size is big for some particular functions, but remain constant within one call?

I mean b), if I understood correctly. Do you think that if I had a large number of elements in the tuple, say 100, initializing a vector with typeof() will be cumbersome? Maybe there is an automatic way to do this too.

Consider splitting one big tuple in a group of smaller tuples or structures, like

t1 = (a, b, c, d, e)
s1 = MyStruct(f,g,h)
t = (t1, s1)

Or split your data and processing into different vectors and functions.

Also tuples are not very efficient for very large number of elements.

If you need to fill one big table, there are such packages as DataFrames or JuliaDB.

1 Like

Perfect, that’s exactly why I linked to IndexedTables.jl above :slight_smile:

As discussed over in Ridiculous idea: types from the future (as @mohamed82008 mentioned), there is a pretty nice way to do this with map:

function f(n)
  map(1:n) do i
    (a=i, b=1.23, c=false, d=:foo)
  end
end

This is a pretty nice solution, since it’s shorter than the original code while being type-stable:

julia> f(5)
5-element Array{NamedTuple{(:a, :b, :c, :d),Tuple{Int64,Float64,Bool,Symbol}},1}:
 (a = 1, b = 1.23, c = 0, d = :foo)
 (a = 2, b = 1.23, c = 0, d = :foo)
 (a = 3, b = 1.23, c = 0, d = :foo)
 (a = 4, b = 1.23, c = 0, d = :foo)
 (a = 5, b = 1.23, c = 0, d = :foo)

julia> @code_warntype f(5)
Variables
  #self#::Core.Compiler.Const(f, false)
  n::Int64
  #3::getfield(Main, Symbol("##3#4"))

Body::Array{NamedTuple{(:a, :b, :c, :d),Tuple{Int64,Float64,Bool,Symbol}},1}
7 Likes

map is indeed a good solution. I hadn’t realized that map would allow updating of state as it iterates over a vector, but it does work fine.

BTW, I’d mark this topic as solved, but for some reason the button no longer shows up for me.

EDIT:
However, map restricts one to returning a single result for each iteration. It’s not possible to push multiple or zero results per iteration (without using nested arrays).

You can return a tuple of multiple (or zero) return results and then call Iterators.flatten on the result.

2 Likes