Julia is slower than Python when appending elements to untyped arrays

An equivalent Julia code to this small Python benchmark runs nearly 10x slower, even with higher levels of optimization enabled.

import time

def main():
    N = 100000000
    data = []

    for i in range(N):
        value = i + 1
        data.append(value)

    print(f'starting')
    time_start = time.time_ns()
    total = sum(data)
    time_stop = time.time_ns()
    print(f'finished')
    duration = time_stop - time_start
    duration = 1.0e-3 * duration
    print(f'time taken: {duration} us')
    print(f'total: {total}')

if __name__ == '__main__':
    main()

Here is the equivalent code in Julia.

function main(ARGS)

    N = 100000000
    data = Vector{Any}()

    for i in 1:N
        push!(data, i)
    end

    # throw 1 call away, JIT
    total = sum(data)

    println("starting")
    time_start = Base.time_ns()
    total = sum(data)
    time_stop = Base.time_ns()
    println("finished")
    duration = time_stop - time_start
    duration = 1.0e-3 * duration
    println("time taken: $(duration) us")
    println("total: $(total)")
end

main(ARGS)

Python takes about 3 seconds on my machine. Julia takes about 20 seconds.

Is there any particular reason for this or is it time to admit the over-hyped claims made about Julia’s performance are blatantly false?

Close to C performance? I don’t think so.

  1. this code as-is on my machine takes 2 seconds, not 20. did you miss an order of magnitude?
  2. making that Vector{Int}() instead of Vector{Any}() brings the runtime down to about 0.01s
  3. your conclusion is a bit over-inflammatory in my opinion. it is not clear if you are asking for help increasing the performance of your code (which as demonstrated Julia is of course more than capable of) or if you are trying to start an argument.
12 Likes

You Any array won’t be fast. Try:

data = Int[]

Then its fast. See also: Performance Tips · The Julia Language

1 Like

No, because that’s not equivalent to the Python code above

Here’s a slightly simplified version of your Julia code, which is also vastly faster.

julia> const N = 100000000
100000000

julia> @time sum(1:N)
  0.000001 seconds
5000000050000000
4 Likes

semantically it is, because in Julia values cannot have abstract types and the return of sum depends semantically only on the values of its inputs

unless you are using a definition of “equivalent” that I’m not yet familiar with, in which case you might want to clarify.

1 Like

Allow me to express it this way

  • I am required to write a short benchmark demonstration which uses an array into which any type can be inserted

I am required to write a short benchmark demonstration which uses an array into which any type can be inserted

then I do not understand your comparison to C speeds in the original post. what would an “equivalent” benchmark look like in C? to use “an array into which any type can be inserted”

I think your benchmark will be more compelling if you find one where Julia is actually slower than Python anyway. As I said in my first reply, the code you wrote runs much much faster for me than Python. I suppose it’s possible the performance is architecture-dependent? although 10x would be a big difference — I suggest you double check the number of zeros on your timing output.

2 Likes

Starting a thread with inflammatory and accusative statements sure does get engagement, but it’s harming the community because ragebait doesn’t do anything good to people. It’s also rude as hell towards other people. I suggest that, in the future when you’re disappointed in Julia, you don’t react by lashing out, and perhaps begin with a little humility towards the results you’ve seen.

As has correctly been pointed out, the code you’ve been posted is not meaningful, idiomatic Julia code. There is no point in declaring the vector Any - it doesn’t buy any flexibility or genericness, or allow you to cover more types (since the function always pushes integers to it, anyway). It does not realistically model a scenario where dynamic types are useful.

However, you’re right that when the Julia compiler has no information about types, Julia is typically slower than Python. After all, Python’s interpreter and the entire language has been optimised for exactly that scenario. That’s pretty rare, though. There is little point in writing code with zero type information.

Also, on my computer Julia is ~60% slower than Python, not 10x. Also, in your example, you sum the array twice in Julia but only once in Python, doing significantly more work. If you remove the first call to sum, on my computer, Julia is about 20% slower.

26 Likes
  1. This is irrelevant to the point being made, where Python has been compared with Julia
  2. This is relatively trivial for a competent C programmer

I can’t believe I have to spell this out explicitly but this is a benchmark demonstration.

Would you have been happier if I had appended a single floating point value to the end of the array?

Pretend there is one if you want.

What a way to completely miss the point being made.

This forum is an extremely helpful and welcoming place. But not to hostility like you’re displaying here. I’m going to mute this thread — if you’re ever interested in chatting about benchmarks in good faith I’m happy to re-engage.

11 Likes

You also missed a part of Jakob’s response:

when the Julia compiler has no information about types, Julia is typically slower than Python

That’s what your benchmark shows, and here you have an experienced Julia programmer confirming that it can indeed be the case. Unless you restrict the type of data (e.g., to Vector{Union{Float64, Int}}), the code won’t run faster.

1 Like

The manual’s FAQ has addressed this question and made it clear that you cannot magically speed up Python code by literally translating it to Julia line-by-line:
https://docs.julialang.org/en/v1/manual/faq/#Why-don’t-you-compile-Matlab/Python/R/%E2%80%A6-code-to-Julia?

Quote:

Julia’s performance advantage derives almost entirely from its front-end: its language semantics allow a well-written Julia program to give more opportunities to the compiler to generate efficient code and memory layouts.

The takeaway is that if you disregard such opportunities, you don’t gain the performance benefits.

3 Likes

There’s no need to be so incendiary or combative. It’s not how we do things here.

https://discourse.julialang.org/faq

Nothing sabotages a healthy conversation like rudeness

If you’re unsure, ask yourself how you would feel if your post was featured on the front page of the New York Times.

7 Likes

To bring this discussion back on track, the machine code generated by the Julia compiler in this example is slow compared to the machine code used by the CPython implementation underlying the Python interpreter implementation.

Python is still faster, despite all the overhead of the interpreter running, the fact that everything executes as bytecode, etc.

What this shows is - for whatever reason - the performance of Julia is nothing like what it is claimed to be.

Make whatever excuses and reasons up, it does not change the results.

You’ve participated in this forum long enough to know full well that many things in Julia take time, and if you don’t make use of the performance tips, the performance hits add up. No official source claims that type-unstable Julia beats Python in performance in every conceivable way, and many forum participants have freely offered you information to the contrary. You are free to pretend otherwise and argue with an imaginary group that thinks Julia is magic, just stop doing that here.

3 Likes

There is no reason why Julia should not be faster than Python in this particular context.

In my opinion the main point is that in the context of numerical computing neither of the codes you provided for Python and Julia are idiomatic. Probably, Python is more optimized for this approach of your script because, I guess, Julia programmers would not write the code you included.

From my point of view: a fair comparison needs to use both idiomatic Python and Julia codes which in numerical computing could be:

  • For Julia, I would constrain the type Vector{Any}() to Vector{Int}() or Vector{Float}(). In this case Julia is faster than the code you provided for Python. But also because also your python code is not properly programmed for numerical applications.
  • For Python probably I would use: numpy and define an array with dtype=np.int or np.float64 etc. In these more comparable cases the performance (in my computer with last stable Julia and Python 3.11) is similar with, Julia a little bit faster.
3 Likes

Why not?
Julia isn’t used that way for high performance code, so why should it be heavily optimized?

In Python, you pretty much don’t have an alternative to using [], so obviously it makes much more sense to optimize that use case.
I’m sure you could optimize summing over an array with Any elements in Julia, but no one has sufficiently done so.

On my PC I get a difference of 4x, which seems pretty fair for comparing something unoptimized vs an optimized implementation.

3 Likes