Julia slower than Matlab & Python? No

Here is the graph of the original results with log Y scale, for better viewing.benchmark


I would imagine that the one measurement of @time isn’t always the best way to benchmark things, even if it is the second run.

using BenchmarkTools

julia> @btime C.^0.3
  9.425 s (2 allocations: 3.48 GiB)

I think they need to be compared on the same machine.

I tried with smaller array and Julia version still take twice as longer (I used @btime for measuring):

C = np.random.rand(100,100,100)



29.2 ms ± 539 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


C = rand(100,100,100)

@btime C.^0.3

67.824 ms (4 allocations: 7.63 MiB)

Could you try

@btime $C.^0.3


I have no idea why you are getting this. See what I got in my machine:

In [1]: import numpy as np

In [2]: C = np.random.rand(100,100,100)

In [3]: %%timeit
   ...: C**0.3
17.5 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
julia> using BenchmarkTools

julia> C = rand(100,100,100);

julia> @btime C.^0.3;
  17.387 ms (4 allocations: 7.63 MiB)
1 Like

Here it is:

julia> @btime $C.^0.3;
  74.537 ms (2 allocations: 7.63 MiB)

Really strange… I am getting the same result at office and at home.

what exactly does that mean?

    Vd_target = u(def_y, γ) + β * (θ * EVc[:, zero_ind] + (1 - θ) * EVd[:]) #calling a[:] allocates a new array

how can that be fixed?

Actually, I see a similar performance difference as @Sijun, about a factor of 2 between C and Julia. Strange. Is this perhaps some MKL/LIBM or multithreading issue (just guessing)?

There is already a PR by @tim.holy which shows how to do this: https://github.com/vduarte/benchmarkingML/pull/2


In the paper linked in the original post, they compare a few different GPU implementations but don’t mention that Julia can run in GPUs. Is there anyone around with a GPU that wants to show off CuArrays.jl? It seems like a good fit for this problem.


I don’t have an MKL build of Julia.

Many Python distributions use MKL by default. Having VML hooked up to Python would explain these differences I think.

Numpy and Julia versions run at close to the same speed on both my Macbook and a Ubuntu workstation (default Julia 1.3 vs miniconda3 /w MKL)

Should have said it before: I’m on Windows 10 on this machine.

I made some modifications to Tim Holy’s PR and found that by using Strided.jl, I was able to beat Python/Numpy and Matlab for all sizes other than 151. Here are my timings on the same model macbook as the one the authors used:

151:   326.1284828186035
351:   308.24360847473145
551:   690.5834913253784
751:   1231.9454908370972
951:   1912.0723962783813
1151:  5935.046696662903
1351:  18267.05288887024
1551:  29274.00109767914

I believe this suggests that the difference was that Python and Matlab were multithreading the broadcast operations across the two available threads whereas julia does not multi-thread broadcast unless you use something like Strided.jl.

The second time it’s run, the timings improve. I’m not sure why exactly that is, it seems like all the JIT overhead should have been hit before the benchmarking loop in the first function call, but nonetheless here is the second run timing:

julia> include("julia.jl")
151:  89.70789909362793
351:  288.47689628601074
551:  686.4475965499878
751:  1315.5532121658325
951:  2086.0692977905273
1151:  5694.838190078735
1351:  18829.65850830078
1551:  31184.902906417847

Publish soon? This is already published in JEDC https://www.sciencedirect.com/science/article/pii/S0165188919301939 .

1 Like

The author’s website says “Conditionally Accepted”

1 Like

Apparently not conditional on checking with the Julia community first :slight_smile:


Suddenly I know how to get optimized programs for free…
Step 1: Code up programs for my paper in Matlab
Step 2: Write a naive version in Julia
Step 3: Post on Julia Discourse
Step 4: …
Step 5: Publish!