How many flops does it take to compute a square root?
This (perhaps surprisingly) isn’t a well defined question. it clearly is 1 flop (sqrt), and even is only 1 cpu instruction on most cpus. However, it is generally going to be slower than an add or multiply. You should expect it to be roughly half as fast as a div.
More generally, you can answer these questions with tools like GFlops.jl or, on a hardware-level, with LIKWID.jl.
GFlops.jl (counting FLOP operations in a piece of code via Cassette.jl)
using GFlops
x = rand(1000)
@count_ops sqrt.(x) # gives 1000, i.e. 1 FLOP per element
(Note that GFlops.jl isn’t really reliable. For example, it doesn’t count FLOPs outside of Julia, i.e. due to LAPACK/BLAS)
LIKWID.jl (counts the FLOPS on a hardware level, i.e. by utilizing counters inside of a CPU core, only works on Linux)
julia> using LIKWID
julia> x = rand(1000);
julia> function count_FLOPs(f)
metrics, _ = perfmon(f, "FLOPS_DP"; print=false)
flops_per_second = first(metrics["FLOPS_DP"])["DP [MFLOP/s]"] * 1e6
runtime = first(metrics["FLOPS_DP"])["Runtime (RDTSC) [s]"]
return round(Int, flops_per_second * runtime)
end
count_FLOPs (generic function with 1 method)
julia> count_FLOPs(() -> sqrt.(x))
1000 # again 1 per element
Compare this to e.g. the exponential function for which I find
julia> count_FLOPs(() -> exp.(x))
17000 # 17 FLOPs per element
See Counting FLOPs · LIKWID.jl for more.
So would you count it as two flops in a cost analysis, or simply leave it as one. What’s the norm?
Don’t count flops. what matters is time.
That’s interesting. Thank you.
… or use a metric that is directly proportional to time, as for example the effective throughput metric!