I would suggest that the first thing to try would simply be benchmarking with BenchmarkTools.jl which will generally provide more reproducible and meaningful results than simply running something once with
@time. After all, I would guess from your question that you’re primarily interested in reducing the number of floating-point operations purely as a way to achieve faster run-time, but since run-time is also tied into cache locality, memory throughput, branch prediction, etc., it may be more effective to simply measure the thing you actually care about.
That said, Julia can certainly provide you as much information as you could ever want to digest. In particular,
@code_native provide increasingly low-level depictions of exactly what a given function will do, and if you really want to investigate exactly what is happening at each level, you certainly can. Jameson Nash had a cool talk at the last JuliaCon about building tools to explore all of the levels of compilation and optimization in depth which you might find interesting: https://www.youtube.com/watch?v=l0Go2S_L95M