Enum versus abstract typing

Hi,

I am developing a code aiming at a good balance between speed, resources and generalizability.
In my original code that was using only concrete assets, mainly, the code was relatively fast, about 68ms. Aiming to generalize it, I introduced more abstract typing and, however, the computational time increased to about 98ms; with increased allocation of resources (41.6MiB with respect to the previous 38.5).

In the original code I had 5 different types of enums, which in the new formulation have been replaced by 5 abstract types with subtypes.
In the functions, I have to instantiate several vectors of the abstract types (which were enums previously) and I believe that the increased computational requirements are related to overhead in the initialization of such arrays.

For example, I developed the following test:

@enum xxx x = 1 y = 2

function prova2()

    a = Vector{xxx}(undef, 10000000)

    for i = 1:10000000
        a[i] = x
    end
end

abstract type aaa end
struct bbb <: aaa end
struct ccc <: aaa end

function prova3()

    a = Vector{aaa}(undef, 10000000)

    c = bbb()

    for i = 1:10000000
        a[i] = c
    end
end

@btime prova2()
@btime prova2()

@btime prova3()
@btime prova3()

The output on my computer is the following, which suggests that the generalizable implementation is about 50% more time consuming than the enum-version, most likely because enum are stored as Int32 while the struct version as Int64 (my machine is 64 bit)
9.229 ms (2 allocations: 38.15 MiB)
9.117 ms (2 allocations: 38.15 MiB)
19.251 ms (2 allocations: 76.29 MiB)
18.841 ms (2 allocations: 76.29 MiB)

Can you give me some suggestions on how to proceed to keep an efficient formulation while keeping a generalizable structure if possible?
Should I use enums or an abstract type approach?

Thank you,
Davide

If you can pass the type of variable you are willing to create, it is much faster:

function prova5(T)
    a = Vector{T}(undef, 10000000)
    c = T()
    for i = 1:10000000
        a[i] = c
    end
end

(this is 3 times faster and allocates much less than the enum option).

1 Like

By the way, I think that the increased computational time is associated to type instabilities, in both cases. (I was wrong, as shown below)

2 Likes

Thank you very much for your reply, but unlikely that does not solve my problem.
In my enum implementation, the vector a would assume only a number of possible values (usually less than 3-4).

I propose another example in which I initialize the vector a to bbb() or ccc() whether the index i is odd (as an example).

abstract type aaa end
struct bbb <: aaa end
struct ccc <: aaa end

function prova6()
    a = Vector{aaa}(undef, 10000000)
    for i = 1:10000000
        a[i] = (mod(i, 2) == 1) ? bbb() : ccc()
    end
end

I am not familiar with @enum (I find it quite strange, to be truth). But do you want the positions of the a vector to contain types or something more specific, as numbers? I mean, the enum example and the other examples do not result in the same thing filling the vector a.

every element of a should represent a condition of the system, which can be represented either as a number, as an enum or as a type. At the bottom, there is the updated example using enum.

The use of enum seems quite similar to the C-equivalent version. I assume that behind enum there is a Int type or something like that.
With julia, I would like to keep the speed of that implementation with eventually a better generalization capability as I have seen done by using abstraction. However, I see that performances drop.

in my formulation 30 ms are little, but I plan to significantly scale in dimensions and doubling the computational requirements could mean delay a lot.

@enum xxx x y

function prova7()
    a = Vector{xxx}(undef, 10000000)
    for i = 1:10000000
        a[i] = (mod(i, 2) == 1) ? x : y
    end
end

The right choice will depend on what exactly you want to do with your vector, but an enum seems like a good choice here given the information provided so far. In particular, you might find Performance Tips · The Julia Language useful for some discussion about the trade-offs between using and abusing the type system.

3 Likes

Take a look at these examples:

Using simply a Int64 vector:

julia> function prova9()
           a = Vector{Int64}(undef, 10000000)
           for i = 1:10000000
               a[i] = (mod(i, 2) == 1) ? 1 : 2
           end
       end
prova9 (generic function with 1 method)

julia> @btime prova9()
  33.540 ms (2 allocations: 76.29 MiB)

Using your @enum:

julia> @enum xxx x = 1 y = 2

julia> function prova8()
           a = Vector{xxx}(undef, 10000000)
           for i = 1:10000000
               a[i] = (mod(i, 2) == 1) ? x : y
           end
       end
prova8 (generic function with 1 method)

julia> @btime prova8()
  18.336 ms (2 allocations: 38.15 MiB)

Enum is faster than a simple 64bit Integer vector for that. Actually it is equivalent to the function if it creates a 32bit integer array. The differences are therefore, not in the fact that the types are abstract, or in type instabilities (thus, what I said before was wrong).

Edit: Then, with Int8 is even faster:

julia> function prova11()
           a = Vector{Int8}(undef, 10000000)
           for i = 1:10000000
               a[i] = (mod(i, 2) == 1) ? 1 : 2
           end
       end
prova11 (generic function with 1 method)

julia> @btime prova11()
  12.542 ms (2 allocations: 9.54 MiB)

which makes me wander if a custom implementation of a sort-of enum macro supported on a smaller integer would be the fastest option.

1 Like

@enum already supports this. You can add a ::Int8 to the @enum declaration to choose the underlying integer type:

julia> @enum xxx::Int8 x = 1 y = 2

julia> function prova8()
           a = Vector{xxx}(undef, 10000000)
           for i = 1:10000000
               a[i] = (mod(i, 2) == 1) ? x : y
           end
       end
prova8 (generic function with 1 method)

julia> @btime prova8()
  11.949 ms (2 allocations: 9.54 MiB)
4 Likes