How many flops does it take to compute a square root?

This (perhaps surprisingly) isnâ€™t a well defined question. it clearly is 1 flop (sqrt), and even is only 1 cpu instruction on most cpus. However, it is generally going to be slower than an add or multiply. You should expect it to be roughly half as fast as a div.

More generally, you can answer these questions with tools like GFlops.jl or, on a hardware-level, with LIKWID.jl.

**GFlops.jl** (counting FLOP operations in a piece of code via Cassette.jl)

```
using GFlops
x = rand(1000)
@count_ops sqrt.(x) # gives 1000, i.e. 1 FLOP per element
```

(Note that GFlops.jl isnâ€™t really reliable. For example, it doesnâ€™t count FLOPs outside of Julia, i.e. due to LAPACK/BLAS)

**LIKWID.jl** (counts the FLOPS on a hardware level, i.e. by utilizing counters inside of a CPU core, only works on Linux)

```
julia> using LIKWID
julia> x = rand(1000);
julia> function count_FLOPs(f)
metrics, _ = perfmon(f, "FLOPS_DP"; print=false)
flops_per_second = first(metrics["FLOPS_DP"])["DP [MFLOP/s]"] * 1e6
runtime = first(metrics["FLOPS_DP"])["Runtime (RDTSC) [s]"]
return round(Int, flops_per_second * runtime)
end
count_FLOPs (generic function with 1 method)
julia> count_FLOPs(() -> sqrt.(x))
1000 # again 1 per element
```

Compare this to e.g. the exponential function for which I find

```
julia> count_FLOPs(() -> exp.(x))
17000 # 17 FLOPs per element
```

See Counting FLOPs Â· LIKWID.jl for more.

So would you count it as two flops in a cost analysis, or simply leave it as one. Whatâ€™s the norm?

Donâ€™t count flops. what matters is time.

Thatâ€™s interesting. Thank you.

â€¦ or use a metric that is directly proportional to time, as for example the effective throughput metric!