I was not planing to compare anything except to see that this is effectively the default behavior in another language I develop (please let us not give fuel to another pointless “fairness” benchmark discussion). Of course we can tune that with compiler flags.
Edit, for the records:
julia> @btime subnormal_fortran!($C,$A2,$B,$b); # no opt flags
116.997 μs (0 allocations: 0 bytes)
julia> @btime subnormal_fortranmarchnative!($C,$A2,$B,$b); # -march=native
117.876 μs (0 allocations: 0 bytes)
julia> @btime subnormal_fortranO3!($C,$A2,$B,$b); # -O3
26.371 μs (0 allocations: 0 bytes)
julia> @btime subnormal_fortranO3marchnative!($C,$A2,$B,$b); # -O3 -march=native
13.739 μs (0 allocations: 0 bytes)
julia> @btime subnormal_fortranOfast!($C,$A2,$B,$b); # -Ofast
753.076 ns (0 allocations: 0 bytes)
julia> @btime subnormal_fortranOfastmarchnative!($C,$A2,$B,$b); # -Ofast -march=native
341.280 ns (0 allocations: 0 bytes)