Enum versus abstract typing

Hi,

I am developing a code aiming at a good balance between speed, resources and generalizability.
In my original code that was using only concrete assets, mainly, the code was relatively fast, about 68ms. Aiming to generalize it, I introduced more abstract typing and, however, the computational time increased to about 98ms; with increased allocation of resources (41.6MiB with respect to the previous 38.5).

In the original code I had 5 different types of enums, which in the new formulation have been replaced by 5 abstract types with subtypes.
In the functions, I have to instantiate several vectors of the abstract types (which were enums previously) and I believe that the increased computational requirements are related to overhead in the initialization of such arrays.

For example, I developed the following test:

@enum xxx x = 1 y = 2

function prova2()

    a = Vector{xxx}(undef, 10000000)

    for i = 1:10000000
        a[i] = x
    end
end

abstract type aaa end
struct bbb <: aaa end
struct ccc <: aaa end

function prova3()

    a = Vector{aaa}(undef, 10000000)

    c = bbb()

    for i = 1:10000000
        a[i] = c
    end
end

@btime prova2()
@btime prova2()

@btime prova3()
@btime prova3()

The output on my computer is the following, which suggests that the generalizable implementation is about 50% more time consuming than the enum-version, most likely because enum are stored as Int32 while the struct version as Int64 (my machine is 64 bit)
9.229 ms (2 allocations: 38.15 MiB)
9.117 ms (2 allocations: 38.15 MiB)
19.251 ms (2 allocations: 76.29 MiB)
18.841 ms (2 allocations: 76.29 MiB)

Can you give me some suggestions on how to proceed to keep an efficient formulation while keeping a generalizable structure if possible?
Should I use enums or an abstract type approach?

Thank you,
Davide

If you can pass the type of variable you are willing to create, it is much faster:

function prova5(T)
    a = Vector{T}(undef, 10000000)
    c = T()
    for i = 1:10000000
        a[i] = c
    end
end

(this is 3 times faster and allocates much less than the enum option).

By the way, I think that the increased computational time is associated to type instabilities, in both cases. (I was wrong, as shown below)

Thank you very much for your reply, but unlikely that does not solve my problem.
In my enum implementation, the vector a would assume only a number of possible values (usually less than 3-4).

I propose another example in which I initialize the vector a to bbb() or ccc() whether the index i is odd (as an example).

abstract type aaa end
struct bbb <: aaa end
struct ccc <: aaa end

function prova6()
    a = Vector{aaa}(undef, 10000000)
    for i = 1:10000000
        a[i] = (mod(i, 2) == 1) ? bbb() : ccc()
    end
end

I am not familiar with @enum (I find it quite strange, to be truth). But do you want the positions of the a vector to contain types or something more specific, as numbers? I mean, the enum example and the other examples do not result in the same thing filling the vector a.

every element of a should represent a condition of the system, which can be represented either as a number, as an enum or as a type. At the bottom, there is the updated example using enum.

The use of enum seems quite similar to the C-equivalent version. I assume that behind enum there is a Int type or something like that.
With julia, I would like to keep the speed of that implementation with eventually a better generalization capability as I have seen done by using abstraction. However, I see that performances drop.

in my formulation 30 ms are little, but I plan to significantly scale in dimensions and doubling the computational requirements could mean delay a lot.

@enum xxx x y

function prova7()
    a = Vector{xxx}(undef, 10000000)
    for i = 1:10000000
        a[i] = (mod(i, 2) == 1) ? x : y
    end
end

The right choice will depend on what exactly you want to do with your vector, but an enum seems like a good choice here given the information provided so far. In particular, you might find Performance Tips · The Julia Language useful for some discussion about the trade-offs between using and abusing the type system.

Take a look at these examples:

Using simply a Int64 vector:

julia> function prova9()
           a = Vector{Int64}(undef, 10000000)
           for i = 1:10000000
               a[i] = (mod(i, 2) == 1) ? 1 : 2
           end
       end
prova9 (generic function with 1 method)

julia> @btime prova9()
  33.540 ms (2 allocations: 76.29 MiB)

Using your @enum:

julia> @enum xxx x = 1 y = 2

julia> function prova8()
           a = Vector{xxx}(undef, 10000000)
           for i = 1:10000000
               a[i] = (mod(i, 2) == 1) ? x : y
           end
       end
prova8 (generic function with 1 method)

julia> @btime prova8()
  18.336 ms (2 allocations: 38.15 MiB)

Enum is faster than a simple 64bit Integer vector for that. Actually it is equivalent to the function if it creates a 32bit integer array. The differences are therefore, not in the fact that the types are abstract, or in type instabilities (thus, what I said before was wrong).

Edit: Then, with Int8 is even faster:

julia> function prova11()
           a = Vector{Int8}(undef, 10000000)
           for i = 1:10000000
               a[i] = (mod(i, 2) == 1) ? 1 : 2
           end
       end
prova11 (generic function with 1 method)

julia> @btime prova11()
  12.542 ms (2 allocations: 9.54 MiB)

which makes me wander if a custom implementation of a sort-of enum macro supported on a smaller integer would be the fastest option.

@enum already supports this. You can add a ::Int8 to the @enum declaration to choose the underlying integer type:

julia> @enum xxx::Int8 x = 1 y = 2

julia> function prova8()
           a = Vector{xxx}(undef, 10000000)
           for i = 1:10000000
               a[i] = (mod(i, 2) == 1) ? x : y
           end
       end
prova8 (generic function with 1 method)

julia> @btime prova8()
  11.949 ms (2 allocations: 9.54 MiB)