I am developing a code aiming at a good balance between speed, resources and generalizability.
In my original code that was using only concrete assets, mainly, the code was relatively fast, about 68ms. Aiming to generalize it, I introduced more abstract typing and, however, the computational time increased to about 98ms; with increased allocation of resources (41.6MiB with respect to the previous 38.5).
In the original code I had 5 different types of enums, which in the new formulation have been replaced by 5 abstract types with subtypes.
In the functions, I have to instantiate several vectors of the abstract types (which were enums previously) and I believe that the increased computational requirements are related to overhead in the initialization of such arrays.
For example, I developed the following test:
@enum xxx x = 1 y = 2
function prova2()
a = Vector{xxx}(undef, 10000000)
for i = 1:10000000
a[i] = x
end
end
abstract type aaa end
struct bbb <: aaa end
struct ccc <: aaa end
function prova3()
a = Vector{aaa}(undef, 10000000)
c = bbb()
for i = 1:10000000
a[i] = c
end
end
@btime prova2()
@btime prova2()
@btime prova3()
@btime prova3()
The output on my computer is the following, which suggests that the generalizable implementation is about 50% more time consuming than the enum-version, most likely because enum are stored as Int32 while the struct version as Int64 (my machine is 64 bit)
9.229 ms (2 allocations: 38.15 MiB)
9.117 ms (2 allocations: 38.15 MiB)
19.251 ms (2 allocations: 76.29 MiB)
18.841 ms (2 allocations: 76.29 MiB)
Can you give me some suggestions on how to proceed to keep an efficient formulation while keeping a generalizable structure if possible?
Should I use enums or an abstract type approach?
Thank you very much for your reply, but unlikely that does not solve my problem.
In my enum implementation, the vector a would assume only a number of possible values (usually less than 3-4).
I propose another example in which I initialize the vector a to bbb() or ccc() whether the index i is odd (as an example).
abstract type aaa end
struct bbb <: aaa end
struct ccc <: aaa end
function prova6()
a = Vector{aaa}(undef, 10000000)
for i = 1:10000000
a[i] = (mod(i, 2) == 1) ? bbb() : ccc()
end
end
I am not familiar with @enum (I find it quite strange, to be truth). But do you want the positions of the a vector to contain types or something more specific, as numbers? I mean, the enum example and the other examples do not result in the same thing filling the vector a.
every element of a should represent a condition of the system, which can be represented either as a number, as an enum or as a type. At the bottom, there is the updated example using enum.
The use of enum seems quite similar to the C-equivalent version. I assume that behind enum there is a Int type or something like that.
With julia, I would like to keep the speed of that implementation with eventually a better generalization capability as I have seen done by using abstraction. However, I see that performances drop.
in my formulation 30 ms are little, but I plan to significantly scale in dimensions and doubling the computational requirements could mean delay a lot.
@enum xxx x y
function prova7()
a = Vector{xxx}(undef, 10000000)
for i = 1:10000000
a[i] = (mod(i, 2) == 1) ? x : y
end
end
The right choice will depend on what exactly you want to do with your vector, but an enum seems like a good choice here given the information provided so far. In particular, you might find Performance Tips · The Julia Language useful for some discussion about the trade-offs between using and abusing the type system.
julia> function prova9()
a = Vector{Int64}(undef, 10000000)
for i = 1:10000000
a[i] = (mod(i, 2) == 1) ? 1 : 2
end
end
prova9 (generic function with 1 method)
julia> @btime prova9()
33.540 ms (2 allocations: 76.29 MiB)
Using your @enum:
julia> @enum xxx x = 1 y = 2
julia> function prova8()
a = Vector{xxx}(undef, 10000000)
for i = 1:10000000
a[i] = (mod(i, 2) == 1) ? x : y
end
end
prova8 (generic function with 1 method)
julia> @btime prova8()
18.336 ms (2 allocations: 38.15 MiB)
Enum is faster than a simple 64bit Integer vector for that. Actually it is equivalent to the function if it creates a 32bit integer array. The differences are therefore, not in the fact that the types are abstract, or in type instabilities (thus, what I said before was wrong).
Edit: Then, with Int8 is even faster:
julia> function prova11()
a = Vector{Int8}(undef, 10000000)
for i = 1:10000000
a[i] = (mod(i, 2) == 1) ? 1 : 2
end
end
prova11 (generic function with 1 method)
julia> @btime prova11()
12.542 ms (2 allocations: 9.54 MiB)
which makes me wander if a custom implementation of a sort-of enum macro supported on a smaller integer would be the fastest option.
@enum already supports this. You can add a ::Int8 to the @enum declaration to choose the underlying integer type:
julia> @enum xxx::Int8 x = 1 y = 2
julia> function prova8()
a = Vector{xxx}(undef, 10000000)
for i = 1:10000000
a[i] = (mod(i, 2) == 1) ? x : y
end
end
prova8 (generic function with 1 method)
julia> @btime prova8()
11.949 ms (2 allocations: 9.54 MiB)