How many flops does it take to compute a square root?

How many flops does it take to compute a square root?

This (perhaps surprisingly) isn’t a well defined question. it clearly is 1 flop (sqrt), and even is only 1 cpu instruction on most cpus. However, it is generally going to be slower than an add or multiply. You should expect it to be roughly half as fast as a div.

More generally, you can answer these questions with tools like GFlops.jl or, on a hardware-level, with LIKWID.jl.

GFlops.jl (counting FLOP operations in a piece of code via Cassette.jl)

using GFlops
x = rand(1000)
@count_ops sqrt.(x) # gives 1000, i.e. 1 FLOP per element

(Note that GFlops.jl isn’t really reliable. For example, it doesn’t count FLOPs outside of Julia, i.e. due to LAPACK/BLAS)

LIKWID.jl (counts the FLOPS on a hardware level, i.e. by utilizing counters inside of a CPU core, only works on Linux)

julia> using LIKWID

julia> x = rand(1000);

julia> function count_FLOPs(f)
           metrics, _ = perfmon(f, "FLOPS_DP"; print=false)
           flops_per_second = first(metrics["FLOPS_DP"])["DP [MFLOP/s]"] * 1e6
           runtime = first(metrics["FLOPS_DP"])["Runtime (RDTSC) [s]"]
           return round(Int, flops_per_second * runtime)
count_FLOPs (generic function with 1 method)

julia> count_FLOPs(() -> sqrt.(x))
1000 # again 1 per element

Compare this to e.g. the exponential function for which I find

julia> count_FLOPs(() -> exp.(x))
17000 # 17 FLOPs per element

See Counting FLOPs · LIKWID.jl for more.


So would you count it as two flops in a cost analysis, or simply leave it as one. What’s the norm?

Don’t count flops. what matters is time.

1 Like

That’s interesting. Thank you.

… or use a metric that is directly proportional to time, as for example the effective throughput metric!

1 Like