Julia's applicable context is getting narrower over time?

Yifan_Liu · February 11, 2021, 4:52am

I heard some Python guys said that Julia’s niche will get narrower over time. The argument is that the basic built-in functions of Python are written in C so they are faster than the Julia counterparts. For large packages, people can use C++ to get extreme performance with its black magic. Julia is only preferable for mid-sized projects. Over time, this niche will get narrower because Python’s built-in functions and large packages both become more comprehensive. Is this true?

Oscar_Smith · February 11, 2021, 5:00am

The key problem with this argument is that “basic built-in functions of Python are written in C so they are faster than the Julia counterparts” is false.

Tomas_Pevny · February 11, 2021, 5:08am

You will also found a plenty of examples here, especially by @ChrisRackauckas, where he shows that with the approach where python is a glue language to C libraries, you cannot easily mix different libraries. For example automatic differentiation with ODE.

You can see this on a difference in ecosystem. Julia tend to have relatively small and general packages which you can mix together, whereas those in python tends to be large and tightly coupled.

jling · February 11, 2021, 5:57am

why did we spend 10 years making Numpy ecosystem?

so extreme that people have to re-write Numpy functions with C++ again and still be 2x slower than Julia, see Bottleneck · PyPI

also, please don’t make a post because “some python guys” said something that is likely false/ill-informed. I’m sure you’ve seen Julia function “not slower” than Python functions that are “fast just because there were written in C”?

Satvik · February 11, 2021, 5:57am

We recently added Julia to our codebase, where we’d previously used Python only for 3 years. This was motivated by optimizing a major process that was a key part of our product and cost $15k/month to run. As you can imagine, we had already heavily optimized it in Python. Everything was using numpy and calling C libraries beneath the hood, but besides that we employed many performance tricks that resulted in very ugly and unpythonic code. We had several very talented engineers look at this, including one who specializes in optimizing numeric Python code.

In a few days of work, using Julia for the first time, I was able to produce a prototype that was 14 times faster than the heavily optimized Python code. There are a lot of reasons for that, but a few of the main ones were Julia’s light-weight threading model and the fact that profiling in Julia is tremendously easier, since it’s “Julia all the way down.” A few weeks later that code is in production, and has seen another ~2x speedup for a total of ~30x. As far as I can tell, the improvements I made wouldn’t be possible to replicate in Python right now. The fact that we spent 200+ expert hours on the Python version vs. 10 hours of a beginner’s time on the Julia also tells you a lot about the developer experience.

So I don’t think Python is narrowing Julia’s niche – if anything, Julia’s niche is currently expanding as more and more libraries are developed.

xzackli · February 11, 2021, 5:58am

This kind of thing has been discussed a lot. I think this particular argument uses an incorrect assumption that a problem domain’s methods and challenges are fixed, so in the infinite-time limit, somebody will generate the optimal machine code with a Python wrapper. What if that frontier’s moving, because people are making research progress? If you’re trying to instead develop new methods and ideas, it’s hard to beat Julia.

Elrod · February 11, 2021, 6:08am

Basic built-in functions in Julia are likely to be at least as fast as generic C-counterparts.
For example, many of Julia’s special functions are implemented in pure Julia instead of C or Fortran.

Let’s run a quick benchmark:

# uses the system C library
clog(x) = ccall(:log, Float64, (Float64,), x)
# uses LLVM's log
llvmlog(x) =  ccall(Symbol("llvm.log.f64"), llvmcall, Float64, (Float64,), x)

@btime log($(Ref(1.2))[])    
@btime clog($(Ref(1.2))[])    
@btime llvmlog($(Ref(1.2))[])

I get:

julia> @btime log($(Ref(1.2))[])
  5.252 ns (0 allocations: 0 bytes)
0.1823215567939546

julia> @btime clog($(Ref(1.2))[])
  6.787 ns (0 allocations: 0 bytes)
0.1823215567939546

julia> @btime llvmlog($(Ref(1.2))[])
  6.359 ns (0 allocations: 0 bytes)
0.1823215567939546

Or exp:

# uses the system C library
cexp(x) = ccall(:exp, Float64, (Float64,), x)
# uses LLVM's
llvmexp(x) =  ccall(Symbol("llvm.exp.f64"), llvmcall, Float64, (Float64,), x)

@btime exp($(Ref(1.2))[])    
@btime cexp($(Ref(1.2))[])    
@btime llvmexp($(Ref(1.2))[])

Producing:

julia> @btime exp($(Ref(1.2))[])
  6.266 ns (0 allocations: 0 bytes)
3.3201169227365472

julia> @btime cexp($(Ref(1.2))[])
  11.079 ns (0 allocations: 0 bytes)
3.3201169227365472

julia> @btime llvmexp($(Ref(1.2))[])
  10.734 ns (0 allocations: 0 bytes)
3.3201169227365472

Julia is also really good at magic performance tricks should you need them when it comes to larger projects.

lungben · February 11, 2021, 8:39am

Here is a discussion I have with someone who does not belive that Julia functions are sometimes faster than Numpy.

https://www.reddit.com/r/Python/comments/lepjb3/how_does_the_python_community_view_julia/gmhhsr0/?utm_source=reddit&utm_medium=web2x&context=3

In my tests, element-wise exp in Julia is ~20% faster, the sum of a vector even 4 times.

Only memory allocations are much faster in Numpy than in Julia (10 µs vs. 1 ms for 1M Float64).

ChrisRackauckas · February 11, 2021, 10:25am

Hmm, one person is posting code showing Julia is faster which I double checked locally, and the other is saying “I don’t buy reverse-FUD” and saying performance numbers without any code. Who to believe this is a hard one. Wait a second, if you search Julialang/julia for pull requests, you find:

github.com/JuliaLang/julia

A faster version of exp, exp2, exp10 for Floating point numbers

JuliaLang:master ← oscardssmith:patch-2

opened 06:57AM - 22 Jul 20 UTC

oscardssmith

+240 -268

This is based on the Glibc algorithm which @chriselrod (Elrond on discourse) des…cribed the algorithm of for me. It appears to be about 2x faster than the current algorithm, and equally accurate over the range for which I have tried it. It also theoretically should be easier to vectorize as branches are only used for checking for over/underflow. The biggest downside is that it uses a table of 1024 values to speed up the calculation.

Which is exactly the change named exactly how you’d expect merged at the time you expect. But nah, “reverse-FUD” . What is this kind of thinking where it’s an easily searchable fact that is being “argued”? This is bizarre.

tim.holy · February 11, 2021, 2:27pm

julia> using BenchmarkTools

julia> @btime Vector{Float64}(undef, 10^6);
  8.377 μs (2 allocations: 7.63 MiB)

Are you by any chance comparing initilaized vs uninitialized memory?

julia> @btime zeros(10^6);
  452.453 μs (2 allocations: 7.63 MiB)

lungben · February 11, 2021, 2:36pm

Yes, you are right, I compared the creation of an initialized array (all elements set to 0) both in Numpy and Julia
It looks like not the allocation but the initialization costs the majority of time in Julia:

Python:

%timeit a = np.empty(1_000_000)
9.55 µs ± 107 ns per loop

%timeit a = np.zeros(1_000_000)
9.94 µs ± 164 ns per loop

%timeit a = np.ones(1_000_000)
1.4 ms ± 3.02 µs per loop

Julia:

@btime Vector{Float64}(undef, 1_000_000)
  12.200 μs (2 allocations: 7.63 MiB)

@btime x = zeros(1_000_000)
  1.199 ms (2 allocations: 7.63 MiB)

@btime x = ones(1_000_000)
  1.255 ms (2 allocations: 7.63 MiB)

versioninfo()
Julia Version 1.6.0-rc1
Commit a58bdd9010 (2021-02-06 15:49 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, icelake-client)

Edit: interestingly, in Numpy initializing the array with ones is much slower than with zeros, in Julia both take the same time.

Oscar_Smith · February 11, 2021, 2:38pm

Should zeroing the memory take this long? Something feels off here.

jebej · February 11, 2021, 2:47pm

numpy is probably using calloc, see e.g.:
here and comments below/above: https://github.com/JuliaLang/julia/issues/9147#issuecomment-312722941

and here: https://github.com/JuliaLang/julia/issues/130

tim.holy · February 11, 2021, 2:48pm

That’s O(1cycle/element) so it’s not insane, but clearly it should be possible to do better.

tim.holy · February 11, 2021, 2:56pm

c - Why malloc+memset is slower than calloc? - Stack Overflow seems to suggest that filling with zeros is a special case: in particular, until you start storing nonzeros the kernel might lie to you and mmap a page of all-zeros. Once you actually need the memory, though, it would have to pause and replace the buffer with real memory, also initializing it properly. So the fast performance of Python’s zeros probably doesn’t really matter much in practice.

lungben · February 11, 2021, 3:33pm

Looking into the Julia source code, both zeros and ones allocate first an undefined array (which may or may not be already zero) and then fill all elements as 0 or 1.

for (fname, felt) in ((:zeros, :zero), (:ones, :one))
    @eval begin
        $fname(dims::DimOrInd...) = $fname(dims)
        $fname(::Type{T}, dims::DimOrInd...) where {T} = $fname(T, dims)
        $fname(dims::Tuple{Vararg{DimOrInd}}) = $fname(Float64, dims)
        $fname(::Type{T}, dims::NTuple{N, Union{Integer, OneTo}}) where {T,N} = $fname(T, map(to_dim, dims))
        function $fname(::Type{T}, dims::NTuple{N, Integer}) where {T,N}
            a = Array{T,N}(undef, dims)
            fill!(a, $felt(T))
            return a
        end
        function $fname(::Type{T}, dims::Tuple{}) where {T}
            a = Array{T}(undef)
            fill!(a, $felt(T))
            return a
        end
    end
end

If the initialized values are actually used, the timings are roughly similar in Numpy and Julia:

%%timeit
    ...: a = np.zeros(1_000_000)
    ...: a[:] += 1
    ...:
1.78 ms ± 10.6 µs per loop

vs.

@btime begin
       x = zeros(1_000_000)
       x .+= 1
       end
  1.649 ms (2 allocations: 7.63 MiB)

Therefore I think @tim.holy is right - when the allocated zeros are actually used, the performance benefit of Numpy is gone.

Yifan_Liu · February 11, 2021, 4:13pm

I have spent some time reading Reddit posts in the past month. My impression is that most R users don’t believe Julia is faster than Rcpp and most Python users don’t believe Julia is faster than Numpy.

In a Reddit thread which I could not find anymore, a Julia user said if a Julia code is not as fast as C, that should be considered a bug. Is this claim an exaggeration?

pixel27 · February 11, 2021, 4:21pm

Julia provides more memory protections than C/C++ so I think Julia can come close when LLVM can optimize what you are doing. The @inbounds macro helps a lot here but I think there are probably some unsafe fast things you could do with C/C++ that you probably can’t duplicate in Julia.

tim.holy · February 11, 2021, 4:26pm

Try writing a matrix-multiplication algorithm in C/C++ that doesn’t get blown out of the water by the performance of

using LoopVectorization

function A_mul_B!(𝐂, 𝐀, 𝐁)
   @avx for m ∈ axes(𝐀,1), n ∈ axes(𝐁,2)
       𝐂mn = zero(eltype(𝐂))
       for k ∈ axes(𝐀,2)
           𝐂mn += 𝐀[m,k] * 𝐁[k,n]
       end
       𝐂[m,n] = 𝐂mn
   end
end

And then also compare how much code you had to write? See benchmarks at https://chriselrod.github.io/LoopVectorization.jl/stable/examples/matrix_multiplication/#Matrix-Multiplication.

GunnarFarneback · February 11, 2021, 4:55pm

In my opinion yes, but not by all that much. Well optimized C code that uses all the tricks in the book is still difficult to match. For fun I’ve tried porting libdeflate to Julia as literally as possible and I still haven’t done better than 2-3 times slower. Probably the best Julia wizards can improve on that though. For more pedestrian C code you basically can expect the same performance and will have a much easier time adding SIMD optimizations (if applicable) or multithreading.

When I port Matlab code with mixed in pieces of C implementations to Julia I never even consider keeping the C code as is (I would if it was large, but that hasn’t been the case). In fact porting the C parts tends to be easier than the Matlab code, even though you have to take care of 1-based versus 0-based indexing.

Topic		Replies	Views
Gradual Julia-ization of Python libraries General Usage package , python , pythoncall	36	3362	July 18, 2025
Code written in Julia vs code written using PyCall New to Julia	5	2174	October 12, 2018
On Julia interoperability with C and PyCall New to Julia pycall , python , c , ipc	8	1349	July 6, 2021
Why are calls to this function slow, compared to Python+Numpy? General Usage fast-math	14	2114	August 13, 2017
[feedback] Make Julia as Pythonic as possible Meta Discussion python	12	4840	September 7, 2020

Julia's applicable context is getting narrower over time?

Related topics