I was wondering why converting floats to integers takes so much time? When I run this loop below it takes about 14ms. When I remove the Int() it takes only 1.9ns if I’m not mistaken. Any idea how to improve the speed?
@btime for x=1:1000000
Rule number 1 of benchmarking is to never do so in the global scope. Rule number 2 is to benchmark something that makes sure the compiler doesn’t cheat. My guess is that in the 1st case, the compiler is able to just perform
and note that integer magnitudes >
maxintfloat(Float64) cannot be assured to convert correctly.
So you mean I should wrap it inside a function?
Julia requires use of functions for good performance. Also, be careful when benchmarking that you are timing the code you mean to. If Julia can see a way to ignore running your code and replace it with faster code that gives the same answer, the compiler is free to do so.
It’s also good to get used to some rough estimations when benchmarking. A single instruction e.g.
+ on two primitive types like
Float64 takes a few nanoseconds. If you have an array of 1_000_000
Float64s, it will roughly take 1_000_000 x “few nanoseconds” which is around a few microseconds. You are even doing more in your example, so 14ms seems fine.
My main point is: if you’re doing some calculations on 1_000_000 numbers and measure a few nanoseconds, then either the compiler figured out a shortcut, or you are benchmarking something completely different. Current CPUs cannot do 1_000_000 instructions in ~2ns. No matter what kind of instructions these are, limited by their clock speed of a few GHz.
p.s. a petahertz CPU could do that
Your timings seem pretty reasonable. 14ms is what you should expect for this loop. You can try a slight tweak that is more idiomatic and ever so slightly faster:
foo(x) = Int(floor((x-1/x)/sqrt(3))) # this is your code
bar(x) = floor(Int, (x-1/x)/sqrt(3)) # this is what you should do
Benchmarks vary a bit, but here’s one:
julia> @btime foo(x) setup=(x=rand(1:10^6));
16.067 ns (0 allocations: 0 bytes)
julia> @btime bar(x) setup=(x=rand(1:10^6));
12.922 ns (0 allocations: 0 bytes)
Do that a million times, and you’ve got 12-16ms.
Your benchmarking code, however, is not optimal. It’s in global scope, and, also, only the last iteration of the loop is actually kept, which gives the compiler leeway to ditch the first 999999 iterations. There’s no point in looping like that, the
@benchmark macro already runs many iterations to get statistics, so don’t do that yourself, you’re just getting in the way of the benchmarking.
Edit: Here’s another alternative:
baz(x) = floor(Int, (x^2-1) / (sqrt(3)*x))
julia> @btime baz(x) setup=(x=rand(1:10^6));
10.619 ns (0 allocations: 0 bytes)