on purpose
assuming people purposefully use global scope to perform unrealistic task just to show Julia is slow, I guess this demo is applicable.
I just meant to show that if one writes type-unstable code (in this case, on purpose), performance is bad, and probably that is what “naive” may mostly mean in Julia.
By replacing zeros
with similar
(which just allocates the output array without setting it to 0) and adding @inbounds
the function gives the same performance as broadcasting:
function f2(a,b)
y = similar(a)
@inbounds for i in 1:length(a)
y[i] = a[i]/b[i]
end
y
end
But I agree that this is no longer naive
Coming from Python / Numpy, broadcasting seemed to be the most natural way to do “vectorized” operations in Julia.
but even in this case julia is still faster:
In [7]: %%timeit
...: for i in range(0,len(y)):
...: y[i] = a[i]/b[i]
...:
...:
229 µs ± 2.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
julia> @btime for i in 1:length(a)
y[i] = a[i] / b[i]
end
97.144 μs (4980 allocations: 93.45 KiB)
you can’t assume user knows to vectorize in numpy but don’t know broadcasting in Julia.
Exactly. This is correct. It is easy to write fast Julia code, if one is aware of the basic stuff. But there is an initial set of things to learn and get used to (as expected for everything that does not do magic).
I think we do not disagree in anything.
Naive implementation usually also includes slicing without views. Since numpy slice are equivalent to Julia views, direct translation of numpy code can result in huge allocations and as a result very bad performance.
That’s of course correct, but it seems like this case is the most common demonstration of “Julia is slower than Python.” People tend to:
- benchmark in global scope
- without running once to compile, and
- without interpolating their globals with
$
- and often with a type instability
It’s understandable why they do this. They see a neat blog with a quick Jupyter notebook, they download Julia, and want to try a couple things out. It’s probably necessary to read several different parts of the manual to get a proper, non-naive benchmark.
Naive is not a synonyme for plain, nor for evil. Oxford Languages defines it as “showing a lack of experience, wisdom, or judgement”, or “natural and unaffected; innocent”. I would thus define it as having the best intentions, but little expertise.
Some Julia-specific ways of unintentionally shooting oneself in the foot were listed in the comments above. It is easy to get a slow-down by two orders of magnitude. An example of a slow-down by a factor of 70 due to unnecessary allocations was actually cited in my previous post.
respect to a more optimal Julia code, sure. But what about when compared to Python / Numpy? Your last statement was:
but that post was about C++. Now, I think everyone can agree there are more quicks in C++ than Python or Julia.
Now I could probably write a sufficiently fast solution of the cited problem in Python. I wouldn’t. You won, Julia is the best, and inherently better than Python.
For everyone else - see above. Thank you.
straw man fallacy is not fun. That was not my argument at all.
There have been heaps of threads like “Why is my code translated from Python so much slower in Julia?” I don’t understand the point of denying that naive Julia code can be slower than Python.
I’m yet to see a computationally heavy task (i.e. not about julia starting time is slow) that is written in either: both using for loop, or, both in vectorized/broadcasting style; that shows Julia to be much slower than Python/Numpy.
The point is just that if one doesn’t bother to learn Julian idioms and techniques, and instead just ‘writes Python’ in Julia with surface level syntax changes, it’s absolutely quite easy to end up with massive performance problems.
Perhaps the best example would just be tight loops involving global variables, or creating empty, I typed arrays and then pushing to them. Sure, it’s pretty easy to learn how to avoid these problems, but that’s not the point.
for the record, I just want to amend to:
that, in these practices Python will be slow too (due to similar reason, being a dynamic language itself) and very often slower than Julia:
julia> a = rand(10^5);
julia> b = 0;
julia> @btime for x in a
if x > 0.5
global b+=x
end
end
6.738 ms (349237 allocations: 6.85 MiB)
In [16]: a = np.random.rand(10**5)
In [17]: b=0
In [18]: %%timeit
...: for x in a:
...: global b
...: if x > 0.5:
...: b+=x
...:
13 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
But I think everyone in the thread have something to takeaway and I guess I should stop being annoying.
It can sometimes carry a higher cost in julia however.
If you’d like an example, here’s an example from this thread:
In [6]: def euclidian_algorithm_division_count(a, b):
...: division_count = 1
...: if b > a:
...: a, b = b, a
...: while (c := a % b) != 0:
...: a, b = b, c
...: division_count += 1
...: return division_count
...:
...: from random import randint
In [7]: %%timeit
...: N = 10**100
...: M = 10**4
...: division_count_array = []
...: while M > 0:
...: a = randint(1, N)
...: b = randint(1, N)
...: division_count_array.append(euclidian_algorithm_division_count(a, b))
...: M -= 1
292 ms ± 7.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
julia> function euclidean_algorithm_division_count(a, b)
division_count = 1
if b > a
a, b = b, a
end
while (c = a % b) != 0
a, b = b, c
division_count += 1
end
return division_count
end
euclidean_algorithm_division_count (generic function with 1 method)
julia> function main()
N = big(10)^100
M = 10^4
division_count_array = []
while M > 0
a, b = rand(1:N, 2)
push!(division_count_array, euclidean_algorithm_division_count(a, b))
M -= 1
end
end
main (generic function with 1 method)
julia> @btime main()
378.040 ms (5618922 allocations: 110.55 MiB)
It’s of course not very hard to make the julia version beat the Python version, but this straightforward transcription (that even uses a function) of naive Python code can still be slower in julia.
Thanks for bearing with me! This is a pretty neat pedagogical example!
Ah, looks like a BigInt issue, less interesting than I initially thought.
be careful with your timings though. timeit
does report the mean, whereas btime
reports the minimum run time.