An equivalent Julia code to this small Python benchmark runs nearly 10x slower, even with higher levels of optimization enabled.
import time
def main():
N = 100000000
data = []
for i in range(N):
value = i + 1
data.append(value)
print(f'starting')
time_start = time.time_ns()
total = sum(data)
time_stop = time.time_ns()
print(f'finished')
duration = time_stop - time_start
duration = 1.0e-3 * duration
print(f'time taken: {duration} us')
print(f'total: {total}')
if __name__ == '__main__':
main()
Here is the equivalent code in Julia.
function main(ARGS)
N = 100000000
data = Vector{Any}()
for i in 1:N
push!(data, i)
end
# throw 1 call away, JIT
total = sum(data)
println("starting")
time_start = Base.time_ns()
total = sum(data)
time_stop = Base.time_ns()
println("finished")
duration = time_stop - time_start
duration = 1.0e-3 * duration
println("time taken: $(duration) us")
println("total: $(total)")
end
main(ARGS)
Python takes about 3 seconds on my machine. Julia takes about 20 seconds.
Is there any particular reason for this or is it time to admit the over-hyped claims made about Julia’s performance are blatantly false?
this code as-is on my machine takes 2 seconds, not 20. did you miss an order of magnitude?
making that Vector{Int}() instead of Vector{Any}() brings the runtime down to about 0.01s
your conclusion is a bit over-inflammatory in my opinion. it is not clear if you are asking for help increasing the performance of your code (which as demonstrated Julia is of course more than capable of) or if you are trying to start an argument.
I am required to write a short benchmark demonstration which uses an array into which any type can be inserted
then I do not understand your comparison to C speeds in the original post. what would an “equivalent” benchmark look like in C? to use “an array into which any type can be inserted”
I think your benchmark will be more compelling if you find one where Julia is actually slower than Python anyway. As I said in my first reply, the code you wrote runs much much faster for me than Python. I suppose it’s possible the performance is architecture-dependent? although 10x would be a big difference — I suggest you double check the number of zeros on your timing output.
Starting a thread with inflammatory and accusative statements sure does get engagement, but it’s harming the community because ragebait doesn’t do anything good to people. It’s also rude as hell towards other people. I suggest that, in the future when you’re disappointed in Julia, you don’t react by lashing out, and perhaps begin with a little humility towards the results you’ve seen.
As has correctly been pointed out, the code you’ve been posted is not meaningful, idiomatic Julia code. There is no point in declaring the vector Any - it doesn’t buy any flexibility or genericness, or allow you to cover more types (since the function always pushes integers to it, anyway). It does not realistically model a scenario where dynamic types are useful.
However, you’re right that when the Julia compiler has no information about types, Julia is typically slower than Python. After all, Python’s interpreter and the entire language has been optimised for exactly that scenario. That’s pretty rare, though. There is little point in writing code with zero type information.
Also, on my computer Julia is ~60% slower than Python, not 10x. Also, in your example, you sum the array twice in Julia but only once in Python, doing significantly more work. If you remove the first call to sum, on my computer, Julia is about 20% slower.
This forum is an extremely helpful and welcoming place. But not to hostility like you’re displaying here. I’m going to mute this thread — if you’re ever interested in chatting about benchmarks in good faith I’m happy to re-engage.
when the Julia compiler has no information about types, Julia is typically slower than Python
That’s what your benchmark shows, and here you have an experienced Julia programmer confirming that it can indeed be the case. Unless you restrict the type of data (e.g., to Vector{Union{Float64, Int}}), the code won’t run faster.
Julia’s performance advantage derives almost entirely from its front-end: its language semantics allow a well-written Julia program to give more opportunities to the compiler to generate efficient code and memory layouts.
The takeaway is that if you disregard such opportunities, you don’t gain the performance benefits.
To bring this discussion back on track, the machine code generated by the Julia compiler in this example is slow compared to the machine code used by the CPython implementation underlying the Python interpreter implementation.
Python is still faster, despite all the overhead of the interpreter running, the fact that everything executes as bytecode, etc.
What this shows is - for whatever reason - the performance of Julia is nothing like what it is claimed to be.
Make whatever excuses and reasons up, it does not change the results.
You’ve participated in this forum long enough to know full well that many things in Julia take time, and if you don’t make use of the performance tips, the performance hits add up. No official source claims that type-unstable Julia beats Python in performance in every conceivable way, and many forum participants have freely offered you information to the contrary. You are free to pretend otherwise and argue with an imaginary group that thinks Julia is magic, just stop doing that here.
In my opinion the main point is that in the context of numerical computing neither of the codes you provided for Python and Julia are idiomatic. Probably, Python is more optimized for this approach of your script because, I guess, Julia programmers would not write the code you included.
From my point of view: a fair comparison needs to use both idiomatic Python and Julia codes which in numerical computing could be:
For Julia, I would constrain the type Vector{Any}() to Vector{Int}() or Vector{Float}(). In this case Julia is faster than the code you provided for Python. But also because also your python code is not properly programmed for numerical applications.
For Python probably I would use: numpy and define an array with dtype=np.int or np.float64 etc. In these more comparable cases the performance (in my computer with last stable Julia and Python 3.11) is similar with, Julia a little bit faster.
Why not?
Julia isn’t used that way for high performance code, so why should it be heavily optimized?
In Python, you pretty much don’t have an alternative to using [], so obviously it makes much more sense to optimize that use case.
I’m sure you could optimize summing over an array with Any elements in Julia, but no one has sufficiently done so.
On my PC I get a difference of 4x, which seems pretty fair for comparing something unoptimized vs an optimized implementation.