Why Julia can be faster than C?

To esitimate Pi with using Monte Calo:

C version:

#include <stdlib.h>
#include <stdio.h>
#include <chrono>
#include <array>

#define N_POINTS 10000000
#define N_REPEATS 5

float estimate_pi(int n_points) {
   double x, y, radius_squared, pi;
   int within_circle=0;

   for (int i=0; i < n_points; i++) {
      x = (double)rand() / RAND_MAX;
      y = (double)rand() / RAND_MAX;

      radius_squared = x*x + y*y;
      if (radius_squared <= 1) within_circle++;
   }

   pi=(double)within_circle/N_POINTS * 4;
   return pi;
}

int main() {
    double avg_time = 0;

    srand(42);

    for (int i=0; i < N_REPEATS; i++) {
        auto begin = std::chrono::high_resolution_clock::now();
        double pi = estimate_pi(N_POINTS);
        auto end = std::chrono::high_resolution_clock::now();
        auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
        avg_time += elapsed.count() * 1e-9;
        printf("Pi is approximately %g and took %.5f seconds to calculate.\n", pi, elapsed.count() * 1e-9);
    }

    printf("\nEach loop took on average %.5f seconds to calculate.\n", avg_time / N_REPEATS);
}

Julia version 1 :

estimate_pi(n_points::Int) = begin
    s = rand(Uniform(-1, 1), n_points)
    within_circle = 0
    for i = 1 : n_points
        x, y = rand(Uniform(-1, 1), 2)
        radius_squared = x^2 + y^2
        if radius_squared <= 1
            within_circle += 1
        end
    end
    pi_estimate = 4 * within_circle / n_points
end

@time estimate_pi(100_0000)
0.035509 seconds (1000.00 k allocations: 76.294 MiB)

Julia version2 :

estimate_pi(n_points::Int) = estimate_pi(rand(Uniform(-1, 1), (n_points, 2)))
estimate_pi(data::M) = begin
    within_circle = 0
    for i = 1 : size(data, 1)
        x, y = data[i, 1], data[i, 2]
        radius_squared = x^2 + y^2
        if radius_squared <= 1
            within_circle += 1
        end
    end
    pi_estimate = 4 * within_circle / size(data, 1)
end

@time estimate_pi(100_0000);
0.003521 seconds (2 allocations: 15.259 MiB)

So, Julia is faster than C, is it the reason that something wrong in the C version code? If not, in what circumencetance we can expect that Julia can be faster than C?
And besides, Julia version2 is a little defferent to Julia version1, but the performance gap is up to 10x. I would think most people use the version2 more often? Is there a way that Julia can automatically convert the Julia version1 codes to Julia version2 codes?

Sometimes is a compiler flag removes the difference. But note that you are calling the Uniform constructor, so one would really need to know what goes on there to understand the details.

Just for the records, this simple alternative is probably faster, and easier to translate to C:

julia> function estimate_pi(n)
           within_circle = 0
           for i in 1:n
               x, y = rand(), rand()
               radius_squared = x^2 + y^2
               within_circle += ifelse(radius_squared <= 1, 1, 0)
           end
           pi_estimate = 4*within_circle/n
       end
estimate_pi (generic function with 1 method)

julia> @btime estimate_pi(10^5)
  226.446 μs (0 allocations: 0 bytes)
3.14384

(actually this is pretty much the translation of your C code, except for which is the random number generator, as mentioned below)

2 Likes

Julia also uses a faster rng algorithm by default (Xoshiro256++), which will probably show up in the differences. Part of the discussion here - Blog: Using Julia on the HPC - is about this difference when comparing to C/C++

4 Likes

One important reason is that you are calling the C code with 10x as many iterations.

9 Likes