You forgot to include the fastest solution which fully removes the allocations by sticking to scalar calculations:
function f(N)
cnt = 0;
for i in 1:N
if rand()^2 + rand()^2 <= 1
cnt += 1
end
end
pi = cnt/N * 4
end
using BenchmarkTools
n = 100000
@btime $f($n) # 611.200 μs (0 allocations: 0 bytes)
https://github.com/JuliaSIMD/VectorizedRNG.jl could be a nice asset for optimizing this example as well.