@turbo on sets of operations

@ribeiro, so I was wrong there, apparently there is a ~30ns (in my machine) delay when one does a single evaluation that gets diluted when many evaluations are performed.

In this case, the correct benchmarks are probably these:

julia> a = rand(); b= rand(); c = rand(); d = rand();

julia> @btime tmpSA($a,$b,$c,$d)
  20.965 ns (0 allocations: 0 bytes)
(-0.8734155439518515, -0.5969015730781967, -0.46053090003180003, -0.6237616099727648)

julia> @btime tmpV($a,$b,$c,$d)
  10.864 ns (0 allocations: 0 bytes)
(-0.8734155439518515, -0.5969015730781967, -0.4605309000318, -0.6237616099727648)


In more recent (nightly) Julia it seems that you need to do something like:

julia> @btime tmpV($(Ref(a))[],$(Ref(b))[],$(Ref(c))[],$(Ref(d))[])
  11.132 ns (0 allocations: 0 bytes)
(-0.8734155439518515, -0.5969015730781967, -0.4605309000318, -0.6237616099727648)


to avoid artifacts on constant propagation. But in 1.6.2 that does not seem to be the case for this benchmark.