Looping over different types with common behavior


#1

I consider two methods presented in the example below. As expected, the second one is better since it uses only concrete types.

ABSTRACT COLLECTION
  34.190 ms (999490 allocations: 15.25 MiB)
CONCRETE COLLECTION
  1.307 ms (0 allocations: 0 bytes)

Is there any way to have a good performance with a more elegant code?

using BenchmarkTools

abstract type HasInt end
struct A <: HasInt
    a::Int
    b::Float64
end
struct B <: HasInt
    a::Int
    b::Float64
    c::Float64
end
struct C <: HasInt
    a::Int
    b::Float64
    c::Float64
    d::Float64
end

value(obj::A) = obj.a
value(obj::B) = obj.a + 1
value(obj::C) = obj.a + 2

function createabstractcollection()
    srand(77777);
    arr = Vector{HasInt}()
    for i in 1:1000000
        t = rand(1:3)
        if t == 1
            push!(arr, A(i, 7.0))
        elseif t == 2
            push!(arr, B(i, 7.0, 7.0))
        else
            push!(arr, C(i, 7.0, 7.0, 7.0))
        end
    end
    return arr    
end

function sumabstractcollection(arr)    
    s = 0
    for obj in arr
        s += value(obj)
    end
    return s    
end

struct concretecollection
    A_vector::Vector{A}
    B_vector::Vector{B}
    C_vector::Vector{C}
end
concretecollection() = concretecollection(Vector{A}(), Vector{B}(), Vector{C}())

function createconcretecollection()
    srand(77777);
    arr = concretecollection()
    for i in 1:1000000
        t = rand(1:3)
        if t == 1
            push!(arr.A_vector, A(i, 7.0))
        elseif t == 2
            push!(arr.B_vector, B(i, 7.0, 7.0))
        else
            push!(arr.C_vector, C(i, 7.0, 7.0, 7.0))
        end
    end
    return arr    
end

function sumconcretecollection(arr)    
    s = 0
    for obj in arr.A_vector
        s += value(obj)
    end
    for obj in arr.B_vector
        s += value(obj)
    end
    for obj in arr.C_vector
        s += value(obj)
    end
    return s
end

function test()
    println("ABSTRACT COLLECTION")
    absarr = createabstractcollection()   
    @btime sumabstractcollection($absarr)
    
    println("CONCRETE COLLECTION")
    concrarr = createconcretecollection()
    @btime sumconcretecollection($concrarr)
end

test()

#2

Maybe using a tuple of concrete vectors instead of concretecollection might make it more elegant. For example, Your sumconcretecollection could be rewritten as:

function createconcretecollection()
    srand(77777);
    arr = (Vector{A}(), Vector{B}(), Vector{C}())
    for i in 1:1000000
        t = rand(1:3)
        if t == 1
            push!(arr[1], A(i, 7.0))
        elseif t == 2
            push!(arr[2], B(i, 7.0, 7.0))
        else
            push!(arr[3], C(i, 7.0, 7.0, 7.0))
        end
    end
    return arr    
end

function sumconcretecollection(arr::Tuple)
    s = 0
    for el in arr, obj in el
        s += value(obj)
    end
    return s
end

Edit: Well, my code actually gives the worst speed :sweat_smile:


#3

Another option is to manually dispatch for a few types in the abstract case.

function sumabstractcollectiondispatch(arr)
    s = 0
    for obj in arr
        if isa(obj, A)
            s += value(obj::A)
        elseif isa(obj, B)
            s += value(obj::B)
        elseif isa(obj, C)
            s += value(obj::C)
        else
            s += value(obj)
        end
    end
    return s
end

#4

You can store a concretely-typed vector of FunctionWrappers instead (essentially the same as storing a function pointer in C). Or if you want a slightly higher-level abstraction, you can try out my (unregistered) Interfaces package (which just stores a struct containing function pointers for your methods of interest, similar to the way virtual methods in C++ would work).


#5

You could also use TypeSortedCollections.jl for this (I’m the author). It basically automates your createconcretecollection/sumconcretecollection approach:

julia> using TypeSortedCollections

julia> tsc = TypeSortedCollection(createabstractcollection());

julia> @btime mapreduce(value, +, 0, $tsc)
  1.204 ms (0 allocations: 0 bytes)
500001499788

#6

I really wish there was something like Rust Enums with fields in Julia https://doc.rust-lang.org/beta/rust-by-example/custom_types/enum.html together with some nice pattern-matching syntax to do handle these cases efficiently.

It is so far the single language feature that I found in other languages which I am missing now and then when using Julia.


#7

That does seem nice.

Do note that performance of the naive version is much better with Julia 0.7 because of optimizations for small unions of bitstypes:

ABSTRACT COLLECTION
  8.286 ms (0 allocations: 0 bytes)
CONCRETE COLLECTION
  1.038 ms (0 allocations: 0 bytes)

but you’re still giving up quite a bit in this case, probably due to simd.

Edit: TypeSortedCollections on Julia 9965e7cab1b599da00f493fc0481ef7eadd29a52 for good measure (currently broken on latest master due to https://github.com/JuliaLang/julia/issues/27871):

julia> tsc = TypeSortedCollection(createabstractcollection());

julia> @btime mapreduce(value, +, 0, $tsc)
  1.047 ms (0 allocations: 0 bytes)

#8

Thank you all for your suggestions. @tkoolen, Indeed TypeSortedCollections.jl seems to be a very good solution if the array isn’t updated often. If I understood correctly, I need to build a new TypeSortedCollection each time it gets updated before I loop on it, which is time consuming.


#9

Well, there are push! and append! methods to modify an existing TypeSortedCollection, but the concrete types that are accepted must be specified by the user or automatically determined from an input collection when the TypeSortedCollection is constructed. I.e., you could also do:

using TypeSortedCollections
tsc = TypeSortedCollection{Tuple{Vector{A}, Vector{B}, Vector{C}}}()
append!(tsc, createabstractcollection())
push!(tsc, A(14, 8.0))

If you want to add a value that has a new type, then you do need to construct a new TypeSortedCollection that also supports that type.


#10

Thank you very much. I will now go and read more the details of the package to use it properly.