Container of heterogeneous types

I have to due with objects with different types, and they’re contained in either a Vector of abstract parent, or a Tuple as follows. Unfortunately, accessing both of them cause warntype :disappointed_relieved:

a = Vector{Real}(undef, 2)
a[1] = 1
a[2] = 2.0
b = (1, 2.0)

function f(x)
    for i in 1:length(x)
        g(x[i])
    end
end

function g(y::Int64)
    println(y + 1)
end

function g(y::Float64)
    println(y + 2.0)
end

julia> f(a)
2
4.0

julia> f(b)
2
4.0

julia> typeof(a)
Array{Real,1}

julia> typeof(b)
Tuple{Int64,Float64}

julia> isconcretetype(typeof(a))
true

julia> isconcretetype(typeof(b))
true

julia> @code_warntype f(a)
Body::Nothing
│    %15 = (Base.arrayref)(true, x, %13)::Real

julia> @code_warntype f(b)
Body::Nothing
│    %4  = (Base.getfield)(x, %2, true)::Union{Float64, Int64}

question: any way to avoid the warntype?

maybe we need to generate the function by macro???

many thanks!

Is it just two different types, Float64 and Int? In that case, did you try

a = Vector{Union{Int, Float64}}(undef, 2)

?

BTW, and easier way to create the example arrays are

a = Real[1, 2.0]
a = Union{Int, Float64}[1, 2.0]

no, there’re a lot of (parametric) types.

btw, defining

does not help:

julia> @code_warntype f(a)
│    %15 = (Base.arrayref)(true, x, %13)::Union{Float64, Int64}

Sure it helps, it went from ::Real to ::Union{Float64, Int64}. That’s a huge difference. What sort of output are you looking for?

Anyway, if it’s many different types, then maybe it cannot be fixe. But what’s the problem, is this a performance bottleneck? Does this matter?

1 Like

the idea is we use metaprogramming to “expand” the abstract collection into a struct with concrete fields, also needs to “expand” the function to use the fields one by one (instead of the for loop):

struct generatedtype
    x1::Int64
    x2::Float64
end

function generatedfun(x)
    g(x.x1)
    g(x.x2)
end

I guess it could be done in Julia. Anyone could help with the macro? thanks.

oh? Union is much faster than abstract type like Real??

now, the problem comes to “how to create a Tuple dramatically”? I mean, using Vector{abstracttype} allows me to do like:

a = Vector{Real}(undef, 2)
for i in 1:2
    a[i] = somefun(i)    # gives anything <: Real
end

but now, a Vector{abstracttype} is worse than a Tuple… again, how to create the Tuple???
(the creation of the container is ok to be slow)

yes it is. The function f() needs to be called many many times after the object is created.

Yes. Small unions (2-3 or maybe more) are now very fast in Julia, due to compiler improvements. You should definitely use that over abstract containers, if you can.

Putting your stuff into a struct vs having it in a tuple/namedtuple is not going to make matters better. Because if you just iterate over the fields of a type the loop will still be over heterogenous types.

If small Unions do not work, you have to resort to lisp-y recursion on tuples. Note however, that that will typically be costly in compile time. How long are your tuples?

1 Like

the tuple can be very long (variable length to be decided in run time)

compilation time is not a problem.

what’s “lisp y recursion”?

looping over heterogeneous container not work, that why I’m thinking of using macro to generate type stable struct and function.

There is a package named StructArrays, that sounds a little bit like what you want, maybe you could check their source code

And do you process each one (of the same type) only once or many times?

This could possibly be a good application for https://github.com/tkoolen/TypeSortedCollections.jl.

using TypeSortedCollections

a = TypeSortedCollection{Tuple{Vector{Int}, Vector{Float64}}}()
push!(a, 1)
push!(a, 2.0)

function g(y::Int64)
    println(y + 1)
end

function g(y::Float64)
    println(y + 2.0)
end

foreach(a) do xi
    g(xi)
end
4 Likes

many many times.

a silly question… installing TypeSortedCollections requires DebuggerFramework, which I could not install:

(v1.1) pkg> add DebuggerFramework
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
ERROR: Unsatisfiable requirements detected for package DebuggerFramework [67417a49]:
 DebuggerFramework [67417a49] log:
 ├─possible versions are: 0.1.0-0.1.2 or uninstalled
 ├─restricted to versions 0.1.2 by an explicit requirement, leaving only versions 0.1.2
 └─restricted by julia compatibility requirements to versions: uninstalled — no versions left

any help? thanks.

do a ] up first?

1 Like

TypeSortedCollections has no dependencies other than Julia 1. Maybe you added a different package that requires DebuggerFramework recently?

without luck trying TypeSortedCollections (until now), I think of a recursive solution to due with heterogeneous container.

Let’s phase the problem first: there’s a parametric type MyParamType, and a Vector{MyParamType}. Noted that this vector is heterogeneous.

struct MyParamType{T<:Real}
    x::T
end

heter = [MyParamType(1), MyParamType(2.2), MyParamType(3), MyParamType(4.4) ]

julia> heter = [MyParamType(1), MyParamType(2.2), MyParamType(3), MyParamType(4.4) ]
4-element Array{MyParamType,1}:
 MyParamType{Int64}(1)
 MyParamType{Float64}(2.2)
 MyParamType{Int64}(3)
 MyParamType{Float64}(4.4)

now, a function that directly accesses the element of this vector would cause warntype:

function f(a::Vector{MyParamType} )
    for i in 1:length(a)
        println(a[i].x)
    end
end

julia> f(heter)
1
2.2
3
4.4

julia> @code_warntype f(heter)
│    %15 = (Base.arrayref)(true, a, %13)::MyParamType
│    %16 = (Base.getfield)(%15, :x)::Real

now, define a new parametric type for recursion:

struct Recursive{S, T}
    part1::S
    part2::T
end

function Recursive(v::Vector{MyParamType})
    if length(v) == 2
        return Recursive(v[1], v[2])
    else
        return Recursive(v[1], Recursive(v[2:end]) )
    end
end

then pass the heterogeneous vector into the outer constructor to create the recursion object:

recur = Recursive(heter)

julia> recur.part1
MyParamType{Int64}(1)
julia> recur.part2
Recursive{MyParamType{Float64},Recursive{MyParamType{Int64},MyParamType{Float64}}}(MyParamType{Float64}(2.2), Recursive{MyParamType{Int64},MyParamType{Float64}}(MyParamType{Int64}(3), MyParamType{Float64}(4.4)))
julia> recur.part2.part1
MyParamType{Float64}(2.2)
julia> recur.part2.part2
Recursive{MyParamType{Int64},MyParamType{Float64}}(MyParamType{Int64}(3), MyParamType{Float64}(4.4))
julia> recur.part2.part2.part1
MyParamType{Int64}(3)
julia> recur.part2.part2.part2
MyParamType{Float64}(4.4)

(note that the call of outer constructor is itself warntype. But it’s fine.)

now, the (time-sensitive) function f() is re-written as g() in a recursive manner:

function g(b::Recursive)
    println(b.part1.x)

    if b.part2 isa MyParamType
        println(b.part2.x)
    else
        g(b.part2)
    end
end

julia> g(recur)
1
2.2
3
4.4

@code_warntype g(recur)         # fine :D

finally! there’s no warntype!!! :grinning:

any comments?

Considering lisp-y tuples, see here JuliaCon 2018 | A practical introduction to metaprogramming in Julia | Andy Ferris - YouTube, including the downsides (which are too big for your case).

following my recursive example, now include the test of TypeSortedCollections:

import TypeSortedCollections
const TSC = TypeSortedCollections.TypeSortedCollection

tsc = TSC(heter, true)

function h(v::TSC)
    total = 0.0
    let total = total
    foreach(function fun(a) total += a.x end, v)
    # println(total)
    end
end

then, f() (that directly accesses heterogeneous elements) and g() (that uses recursion) are slightly modified as follows:

function f(a::Vector{MyParamType} )
    total = 0.0
    for i in 1:length(a)
        total += a[i].x
    end
    # println(total)
end

function g(b::Recursive)
    total = Float64(b.part1.x)
    if b.part2 isa MyParamType
        return total + b.part2.x
    else
        return total + g(b.part2)
    end
    # println(total)
end

finally, their run time:

julia> @btime f(heter)
  159.109 ns (6 allocations: 96 bytes)
julia> @btime g(recur)
  19.245 ns (1 allocation: 16 bytes)
julia> @btime h(tsc)
  131.617 ns (8 allocations: 128 bytes)

note that only f(heter) causes warntype.

now, in this example, recursion is much faster. While TypeSortedCollections helps to avoid warntype, seems that it has a lot of memory allocation overhead.

Despite your effort to work around the closure performance issue using a let block, your h function still has a Core.Box in it. Try with

function h(v::TSC)
    total = Ref(0.0)
    foreach(function fun(a) total[] += a.x end, v)
    # println(total)
    return total[]
end

Results on my machine with this definition:

julia> @btime h(tsc)
  20.223 ns (1 allocation: 16 bytes)
10.600000000000001

julia> @btime h($tsc)
  5.579 ns (0 allocations: 0 bytes)
10.600000000000001

julia> @btime g(recur)
  17.234 ns (1 allocation: 16 bytes)
10.600000000000001

julia> @btime g($recur)
  3.227 ns (0 allocations: 0 bytes)
10.600000000000001

By the way, you could also use mapreduce for this (surprisingly slightly slower):

@btime mapreduce(a -> a.x, +, $tsc; init=0.0)
  8.369 ns (0 allocations: 0 bytes)
10.600000000000001

And note, as @mauro3 has been hinting at, that you’ll run into issues with long tuples, which are avoided by using a tuple of Vectors as TypeSortedCollections does. For example, try the following in a separate Julia session to avoid losing your work:

heter = [rand(Bool) ? MyParamType(rand()) : MyParamType(rand(Int)) for i = 1 : 50]
Recursive(heter)

Just creating the Recursive brings the compiler to its knees with its incredibly complicated type. You can do a lot better by just using Tuples and lispy tuple recursion, for which the Julia compiler has special optimizations and appropriate limits in place, but again the compilation time will scale with the number of elements, whereas with the approach used by TypeSortedCollections it’d scale with the number of different types.

3 Likes