I have a discrete event simulation model in which all outcome values are integer. Currently, it defaults to Int64. Let’s say I want that to be Int32 (or even Int16) as outcome values must always be well under 2e9. Int arrays track the state outcomes. There are two somewhat complex state transition processes (probably overly complex–I’ll work to simplify) as well as some code that introduces special event scenarios.
Questions:
for int matrices of 25 x 8 x 5 (1,000 elements) should I expect much of a performance improvement for Int32 on MacOS, Intel i7? The matrices aren’t large but there are lots of update events during the course of a simulation run. Very trivial benchmark of addition shows that adding an Int32 constant to an Int32 matrix of 2.5e6 elements takes about 1/3 the time of same operation in Int64 and memory allocations are 1/2. Even though the absolute times are small there are a lot operations as these state updates are pretty much all that the simulation does. This suggests changing to Int32 could be worthwhile.
Are there any possible shortcuts for converting everything to use Int32, short of carefully looking at every assignment statement to see where integer constants or externally read data could “pollute” the state matrices and promote them to Int64?
Assuming there is meaningful gain and I get work item 2 completed (even if I have to check over everything and scatter changes all over the place–this will be a good exercise for cleaning up the code), do you have any recommendations on how I could make it easy to change everything with an input parameter so that I can switch from Int32 to Int64 and back again, based on needs of a particular case? My thought would be to have a function variable such as “dotype” that is a constructor for either Int32 or Int64 and use it all over the place where there are integer literals. It can be set once as dotype = Int32 or dotype = Int64 as part of the setup code. In quite a few cases I have named constants rather than integer literals and these can easily be set to the needed type in the setup code. Likewise I can have “myzeros”, which can set equal to the functions for zeros(Int32, args) and zeros(Int64, args).
Answering (2)–(3): I am not sure if there is a way that involves not looking at the code, unless you just do a replace of Int and Int64 by Int32 and wait for InexactErrors to pop up (when the code tries to convert something). It seems to me that an easy switch for this: every struct you use should have type parameters for the integer type used inside it; methods should not enforce a specific integer type but instead to use the integer type used by the parameters passed (and if the method takes no parameter of the correct integer type, but needs to return some struct with elements of the correct integer type, then pass the integer type itself as a parameter to the method).
Yup. Good advice. None of my functions specify Intxx for any argument. Either Int or nothing at all (nearly all of the inbound parameters will be the needed type).
I am about to start on this and will report back. I don’t think there is anything hard, just tedious, with no particular short cuts. It will be a good code review.
Scalar operations should be equally fast with Int32 and Int64.
If your CPU has the AVX2 instruction set (any Mac from the last few years should have it), then it can do SIMD integer operations. This means a single instruction can work on multiple integers at a time – as many as fit in a CPU register. Twice as many Int32s would fit as Int64.
Things like adding an integer to an integer matrix should use these SIMD instructions automatically if possible.
CPUs with AVX2 can convert between Int32 and Float64 with SIMD instructions. If they have AVX512, they can also convert between Int64 and Float64, but not without it.
Multiplication is much faster with Int32.
Many operations are limited more by memory bandwidth than the CPU’s computing ability. This is another place where matrices of Int32 can help by consuming half the memory.
Note that Float64 can exactly represent integers +/- 2^53, and Float32 can exactly represent integers up to +/- 2^24. Many SIMD operations on these floats are faster (especially multiplication), and they don’t require AVX2. It’s worth considering using floats, depending on what you need.
Multiplication is
Although bit operations, or operations that can be done with bit twiddling (like multiplication by known powers of 2 → bitshifts).
If typical state updates in your code are similar to adding a single number to an array and allocating a new array for the results, I would first try to consolidate your updates to work more in-place and do multiple operations in each pass over the data.
In general, a good approach is to write generic code that looks at the type of the container and uses it to construct other types as needed. For example:
function myupdate!(a::AbstractVector{T}) where {T}
a .+= T(2) .* a .+ T(1)
end
will use the type T of the elements of a for the types of the literal constants. This is much more flexible than hard-coding a type or using a global setting.
I take an input of the desired type, either Int32 or Int64. I then create the arrays to be of the indicated type and use a ref variable to the type as T_int[] where I need to be explicit about a type (defining a struct that contains all of the pre-allocated arrays, for instance) and when things are created as zeros(T_int[], n), etc.
The problem is that on a Mac, there is not that much performance to be gained by using Int32 as Int64 matches the machine native word size. What happens is that Int32 is around 5% slower and has 5% more allocations than Int64. It’s not much of a difference, but it means its not worth the work to do it.
The culprit that causes the extra allocations and implicit conversions is that summing an array of Int32 to a scalar results in a scalar that is Int64–the type is ALWAYS promoted. This forces the Int32 code to so some conversions. I put in explicit conversions of sum results to the preferred type. This gets the time and allocations by using Int32 to match Int64, but with the cost of the conversions. Better than implicit conversion, but still not worth the effort compared to just using Int64.
@stevengj’s suggestion is excellent. I had already pre-allocated all of the larger arrays and create/allocate them as the desired type. But, this doesn’t solve the automatic type promotion that sum does for scalar results. Still, his is the right way to do this with minimal effort and less impact on the code.
So, this turned out to be one of those examples Knuth described as unproductive optimization.
I apologise if this is an insulting thing to say. You can change between types in Julia rather easily, should you need to experiment, without having to do global search and replace:
Trying your approach but always get that T is undefined:
function plus!(val, condition, agegrp, lag, locale; dat=openmx::Dict{Int64,Array{T,N} where T,N}) where {T}
@assert (length(locale) == 1 || typeof(locale) <: NamedTuple) "locale must be a single Int or NamedTuple"
dat[locale][lag, condition, agegrp] += T(val)
end
The potential performance benefits of using narrower integers (when applicable) stem from (1) better cache utilization (for memory-bound problems processing large arrays) and (2) wider SIMD operations (for problems that can benefit from SIMD).
You are right, you can actually do something like this:
julia> f(; v :: Vector{T} = Int[]) where T = T
f (generic function with 1 method)
julia> f()
Int64
julia> f(; v = Float64[])
Float64
Not entirely sure why it is not working in your case, maybe you should use the T and N only in the where after the header and not inside the Dict definition? The syntax with the where inside the Dict does not feel right.
You can also do not define the type of dat and inside the body use:
T = eltype(eltype(dat)) # Get the element type of the element type of the Dict.