Julia is slower than Python when appending elements to untyped arrays

karei · February 10, 2025, 3:13pm

If you write type-unstable code, then as jakobnissen says, “you’re right that when the Julia compiler has no information about types, Julia is typically slower than Python. After all, Python’s interpreter and the entire language has been optimised for exactly that scenario.”
When you write type-stable code (which may take you more time), Julia can approach C speed. That’s all.

JM_Beckers · February 10, 2025, 3:15pm

In addition of questioning the relevance of the benchmark, I notice that on my computer your codes run at almost the same speed (Julia being even a little bit faster).

world-peace · February 10, 2025, 3:27pm

Because what the compiler should be doing is very, very simple. And the code is entirely compiled there is none of the additional overhead which Python suffers from.

Benny · February 10, 2025, 3:34pm

For those who are actually curious about what CPython’s built-in sum does, here’s a link, and it’s not hard to find a reason why it does well for an iterable of int. There is no truth to the implication that CPython is inherently slow because it’s “not compiled”; there is actually a decent chunk of compiled code in CPython, the language hinted in the name. There’s nothing preventing CPython from implementing Julia’s much more efficient sum(::AbstractRange) for its built-in range, either.

This is false, and people have told you how in many threads. Compilers are not magic overhead disappearers.

sdanisch · February 10, 2025, 3:39pm

This shouldn’t be so hard to understand…

If you take a language, who’s whole point is to be faster than python because it has more type information, and then you take away the type information, it should be pretty obvious that the result won’t necessarily be a win.

Of course, Julia could have the same speed here, if it had 20 years of optimizing untyped code under it’s belly - which it simply hasn’t since it isn’t a priority.
I’m sure you’re well aware how difficult it is, to improve pythons performance, and a TON of money has been thrown onto the problem of e.g. optimizing untyped arrays. If this was a simple problem, Python would easily match Julia speed for highly typed Julia code and Julia / C wouldn’t be required at all.

It’s really like throwing a Cheetah into water and complaining that it isn’t swimming as fast as a fish.

world-peace · February 10, 2025, 3:47pm

None of you are compiler engineers, so I don’t think your opinions count for much to be honest. None of you have seen an x86 instruction before.

As for posting a link to the CPython code and saying “see that’s why it’s fast”.

No, I don’t see why it is fast. Look at all the overhead with the many function calls there to sanity check various conditions during the iteration.

Looking at this code just makes me think all the more that the Julia compiler must be producing poor performing or inefficient object code.

Put it this way, for the non-experts in this thread.

Python follows a reference for each element of the list, then does some runtime lookup to see how to add the type of the current value to the type of the current sum value
This is very similar to what Julia does, except that Python has some overhead due to the fact the function call starts off as Python code before calling into a C level implementation. Since Julia has all the “magic” as others have put it (it isn’t magic) of being JIT-d you would expect if efficient machine code was being produced and thus there was none of the overhead which Python suffers from, it should be faster but it is not

sdanisch · February 10, 2025, 3:48pm

None of you are compiler engineers, so I don’t think your opinions count for much to be honest. None of you have seen an x86 instruction before.

Haha Ok, why the hell would you even assume this?

Just to prove my point, here’s a very simple version of how an optimized version could look like:


struct UntypedArray 
    data::Dict{DataType, Vector}
end

UntypedArray() = UntypedArray(Dict{DataType, Vector}())
function Base.push!(data::UntypedArray, value::T) where T
    if !haskey(data.data, T)
        data.data[T] = Vector{T}()
    end
    push!(data.data[T], value)
end

function Base.sum(data::UntypedArray)
    if length(data.data) == 1
        sum(last(first(data.data)))
    else
        throw(ArgumentError("Cannot sum an array with more than one type"))
    end
end

function main()

    N = 100000000
    data = UntypedArray()

    for i in 1:N
        push!(data, i)
    end

    # throw 1 call away, JIT
    total = sum(data)

    println("starting")
    time_start = Base.time_ns()
    total = sum(data)
    time_stop = Base.time_ns()
    println("finished")
    duration = time_stop - time_start
    duration = 1.0e-3 * duration
    println("time taken: $(duration) us")
    println("total: $(total)")
end

This is quite a bit faster than the Python version.

Obviously, it’s an incomplete implementation, but with a bit more time you could implement something like this in Julia pretty efficiently which would match pythons speed and would be a complete array implementation.

I don’t know the details of the python sum and array implementation, but I’m pretty sure they have tried hard to come up with something that works really well for untyped elements.

world-peace · February 10, 2025, 3:49pm

As for claiming nonsense about “not providing type information so it’s slow” this doesn’t even make sense for a comparison.

We cannot compare applies to oranges here.

If you want a strong type comparison, compare Julia to Python + Numpy or Julia to C/C++. But this has nothing to do with what I am asking about, as I already said I am working in an environment where we have objects of different types in a list.

world-peace · February 10, 2025, 3:52pm

Why de-rail the thread by posting this irrelevant information? This is not “more optimized” you just chucked a bunch of different type vectors into a dict.

What is anyone supposed to conclude from you solving a different problem to the original one posed?

mbauman · February 10, 2025, 3:53pm

@world-peace, what rails would you like this thread to ride on? What’s your goal here?

sdanisch · February 10, 2025, 3:56pm

My point is, that because the Base array implementation isn’t optimized for sum(x::Vector{Any}), one would need a new implementation that has been optimized for this use case.
The Julia one isn’t optimized for this use case, which is why its slow. I’m not sure why it’s such a hard pill to swallow.

The implementation is shitty and pointless, but I won’t create a decent implementation of an array type that’s better at iterating over any elements just to prove a point for an internet argument.

If you haven’t noticed so far, it’s actually pretty normal in programming, to optimize a data structure for one use case, and take penalties for other use cases which occur less often

Benny · February 10, 2025, 3:59pm

We’ve been telling you this the whole thread and explained why.

If you only trust the developers of compiler implementations, then why did you even bother asking a wider forum?

You’re wrong, look again. If you still don’t get it, keep reading it until you do. If you want help, you can try asking in a Python forum or find the scattered articles and Q&A posts about CPython’s sum running faster than naive for-loops.

PetrKryslUCSD · February 10, 2025, 4:09pm

Has anyone checked the timings?
I get

PS src> py .\benchpy.py
starting
finished
time taken: 36871701.0 us
total: 5000000050000000
PS src> julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.4 (2024-06-04)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> include("benchju.jl")
starting
finished
time taken: 8.02761e+06 us
total: 5000000050000000

julia>

cdawg · February 10, 2025, 4:12pm

@world-peace Im curious, whats your motivation here? Do you feel misled by Julias speed claims? Are you angry at the community? Do you want help making your code faster? Or do you just want to complain about this wonderful OSS project, offered as a gift to the world because a julia issue hurt you as a child?

I bet it was closure Box-ing wasnt it Im sorry buddy, we’ve all been there.

If you think Array{Any} can and should be improved, make a PR. Most of us are happy typing out gigantic arrays, but if you need this, then help out or use a different language. We’re all just bamboozled by your tone here.

greatpet · February 10, 2025, 4:30pm

OK, I get it. You want an artificial benchmark that intentionally measures runtime dispatch performance with type-unstable code. Runtime dispatach is well known to be not Julia’s strongest area (small unions are optimizerd; Any is not.) But I don’t understand why you’re so upset about the so-called “over-hype” about performance, when no one has been hyping Julia’s performance for non-idiomatic type-unstable code, starting from the manual itself (as I linked above). If you think any part of Julia documentation makes misleading and unrealistic claims about performance without mentioning the usual caveats about type stability etc., you’re welcome to point it out and make a PR.

If you simply want a contrived example showing Julia to be not as optimized as theoretically possible when it “should” be (oh, and one shouldn’t “make excuses”), there is a much easier way: just compare programs with non-const global variables, which can run at full speeds in C and Fortran but incur a penalty in Julia, due to intentional design trade-offs.

PetrKryslUCSD · February 10, 2025, 4:42pm

And another machine:

julia> 

julia> include("benchju.jl")
starting
finished
time taken: 4.75440e+06 us
total: 5000000050000000

julia> 
PS rc> py .\benchpy.py
starting
finished
time taken: 9026657.9 us
total: 5000000050000000
PS rc>

So clearly not as reported in the OP.

stevengj · February 10, 2025, 4:55pm

The overhead of the call is negligible for N=10^8 as in the example here. So you are comparing compiled code to compiled code, but CPython’s implementation has a fast path for adding up lists of numbers that are all the same int type (so that it doesn’t need to re-box the intermediate results in new PyObjects, and doesn’t need to repeatedly lookup the + function).

In principle, Julia’s sum function could also add a similar fast path for Vector{Any} where it checks that every element is Int (or whatever), but it’s probably not worth the tradeoff in code complexity in the stdlib — an idiomatic Julia program won’t use an untyped container in a critical code path.

(Plus, CPython is generally more optimized than Julia for the type-unstable case as has been noted above. In addition to the question of where finite engineering resources should be spent on non-idiomatic code, there are some tradeoffs in reference counting vs. garbage collection for this sort of code that is continually allocating new boxed objects.)

PS. I spent most of a lecture on precisely the example of the sum function as part of MIT’s Performance Engineering class. It is an interesting case study for why Python’s semantics limit its performance — given Vector{Any} semantics, they are doing about as well as it is possible to do, but it is not possible to provide static (compile-time) information about a homogeneous container like Vector{Int} or a NumPy array in pure Python.

Benny · February 10, 2025, 5:51pm

I can corroborate the discrepancy 0.855s < 3.376s, 2.569s < 12.269s if uncharging. Not that this thread would last much longer, but I think we can conclude system dependence for the timings. There’s been ample discussion explaining differences in implementation that explains a discrepancy in either direction.

In case anybody wants to compare summation of integer ranges with actual optimization to the point it’s worth using proper benchmarking like timeit.timeit and BenchmarkTools.@btime, here’s a Python translation of sum(r::AbstractRange{<:Real}):

def sum_range(x):
  l = len(x)
  return l*x.start + (
         ((x.step * (l-1)) * (l>>1)) if (l%2==0)
    else ((x.step * l) * ((l-1)>>1)) )

There’s still overhead that doesn’t get it (206ns) to the same speed (2ns) as Julia (int doesn’t overflow, it’s pretty sophisticated design to optimize smaller values even this much), but it’s much faster than the 1.3s that sum(range(1,100000001)) takes, which is nonideally even slower than summing the full list, and also puts numpy.arange to shame. Moral of the story is to use the right types and algorithms before pointing fingers at a compiler or lack thereof.

Mason · February 10, 2025, 6:19pm

Sufficiently poorly written code can be arbitrarily slow in any language.

If you’d like some help, I can help you write an even slower version in C.

DNF · February 10, 2025, 6:48pm

As far as I can tell, your benchmark beautifully demonstrates that the performance of Julia is exactly like what it is claimed to be.

Topic		Replies	Views
Why is python faster than Julia Performance	14	1915	March 12, 2020
Why is this simple function twice as slow as its Python version Performance question	97	4388	April 12, 2021
Why is this code so slow in julia compared to a numpy implementation? Performance performance	9	3566	October 24, 2017
Julia slower than Python to sort and reverse a list of integers Performance	40	2576	April 28, 2023
Numpy 10x faster than Julia ?! What am I doing wrong ?! [solved - julia faster now] Performance question	37	10955	October 15, 2019

Julia is slower than Python when appending elements to untyped arrays

Related topics