Hello.
I am starting using Julia. I like its sintax, simplicity and its claimed performances.
I have been using Matlab/Octave a lot at university and now at work we are using Octave for licence reason. However Octave is really slow and here I have read about Julia.
I have tried some simple matrix multiplication (A*B, not element wise one) but it is really slow.
Octave is faster and Python numpy too.
I am using Julia version 0.6.2 on Ubuntu notebook for the test but the same problem occurs in Windows 10.
Python code below takes 0.006 seconds:
import numpy
import cProfile
n = 1000;
x=numpy.random.random((n,n))
y=numpy.random.random((n,n))
cProfile.run("x*y")
The first run compiles.
This shouldn’t make much of a difference here, because it’s just calling an external (already compiled) BLAS routine, and the wrapper should already be compiled in your Julia system image.
So, which BLAS are each linked to? If you have an Intel processor, you can build Julia with MKL.
Unless I’m mistaken, x*y is elementwise multiplication in Python. Timing things in global scope in Julia is not a good way to get an accurate estimate of realistic runtimes. The right way to do it is this:
julia> using BenchmarkTools
julia> a = rand(1000, 1000);
julia> @benchmark $a .* $a # elementwise
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 2.545 ms (0.00% GC)
median time: 3.824 ms (0.00% GC)
mean time: 4.986 ms (26.48% GC)
maximum time: 78.045 ms (95.63% GC)
--------------
samples: 999
evals/sample: 1
julia> @benchmark $a * $a # matmul
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 51.575 ms (0.00% GC)
median time: 57.945 ms (0.00% GC)
mean time: 60.352 ms (3.93% GC)
maximum time: 142.708 ms (58.50% GC)
--------------
samples: 83
evals/sample: 1
On the same system, this is what I see for the same operations in Python:
>>> import numpy
>>> import cProfile
>>> a = numpy.random.random((1000, 1000))
>>> cProfile.run("a * a")
2 function calls in 0.010 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.010 0.010 0.010 0.010 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
>>> cProfile.run("a.dot(a)")
3 function calls in 0.055 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.055 0.055 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.054 0.054 0.054 0.054 {method 'dot' of 'numpy.ndarray' objects}
So matrix multiplication seems to be the same in Julia and NumPy whereas Julia is significantly faster (4x) at elementwise multiplication. I’m not sure why, perhaps the operation is cheap enough that the overhead of calling out to C is significant in Python (and doesn’t exist in Julia). Edit: no, I tried a 10,000^2 matrix and the relative performance didn’t improve much—3x instead of 4x.
On my machine, both python and Julia are using eight threads. I find that both elementwise multiplication and matrix multiplication take the same amount of time for the two languages.
In [1]: import numpy
In [2]: import cProfile
In [3]: n = 10000;
In [4]: x=numpy.random.random((n,n))
...: y=numpy.random.random((n,n))
...:
In [5]: cProfile.run("x*y")
2 function calls in 0.391 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.391 0.391 0.391 0.391 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
In [6]: cProfile.run("x.dot(y)")
3 function calls in 22.306 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.056 0.056 22.306 22.306 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 22.250 22.250 22.250 22.250 {method 'dot' of 'numpy.ndarray' objects}
julia> n = 10000;
julia> x = rand(n,n);
julia> y = rand(n,n);
julia> x .* y;
julia> @time x .* y;
0.365343 seconds (30 allocations: 762.941 MiB, 10.40% gc time)
julia> @time x * y;
23.990639 seconds (6 allocations: 762.940 MiB, 0.12% gc time)
EDIT: I committed crimes by not doing a proper benchmark and by timing in the top-level. But, in this case it doesn’t make much difference.
Are you still comparing elementwise multiplication in Python with matrix multiplication in Julia? Because a 10x increase in n is expected to be a 100x slowdown in elementwise multiply, which matches the time you’re reporting for NumPy—about 0.6 seconds. Matrix multiply is super-linear in the size of the matrix, so you would expect a much bigger slowdown for that, which is exactly what you’re seeing in Julia.
No, a*a is matrix multiply in Julia but it’s elementwise multiply in Python, which is what you’re timing: cProfile.run("x*y"). That is why Python is faster and scales linearly with the size of the matrix.
If you find it immensely confusing that * does completely different and incompatible things in Python/NumPy for subtly different array types, I can sympathize.
import numpy as np
import cProfile
import time
n = 10000;
x = np.random.random((n,n))
y = np.random.random((n,n))
t0 = time.clock()
z = np.matrix(x) * np.matrix(y)
t1 = time.clock()
total = t1-t0
print("Total (sec) =", total)
Which times:
Total (sec) = 10.513941910715321
[Finished in 12.4s]
And Julia (Julia Pro 0.6.0 for fair comparison):
julia> x = rand(10000,10000);
julia> y = rand(10000,10000);
julia> @time x*y;
9.327905 seconds (6 allocations: 762.940 MiB, 0.12% gc time)
Hope the comparison is clear now! Happy Julia-ing, BTW.