Hi guys,
basicly this. I am not a computer scientist, but I am curious, if I call a python package with julia, does it run faster than in python? Or is it the same or even slower.
Not faster, can be slower if the objects being passed to Python and back to Julia have to be converted somehow.
So, if I want to use SciPy functions with increased perfomance, should I write them de novo in julia or is it enough to use something like GitHub - AtsushiSakai/SciPy.jl: Julia interface for SciPy . On the first glamps it just uses pycall, doesn’t it?
A lot of the algorithms in SciPy are implemented in C so they are not bottlenecked by Python itself. Of course, trying to implement them in Julia is a good idea but it will probably not be very easy to beat Scipy performance since a lot of smart people have worked on those for a long time.
A lot of scipy functionality is already in Julia or in a package. Is there something specific you’re looking for?
I computed two neighbor based graphs and want to calculate the distance between them. I have python code, but I am trying to get into julia, hoping I could speed up the calculations. And for fun of course.
From the scipy package I would use cdist, hierarchy, pearsonsr, the dijkstra algorythm, and some sparse matrix stuff.
Just translating algorithms from scipy to Julia isn’t likely to speed them up in general, though it might if you’re good.
But if you have knowledge of special properties of your problem, or if you are stringing together multiple operations in a particular way, then you could write custom versions of the algorithms for your own case, that uses algorithmic shortcuts. Then you could get significant speedups in Julia that would be difficult in scipy.
Check out Distances.jl
I’ll, thank you
Julia has a small call overhead, i.e. nothing to worry about (assuming transfer to-from Python is no-copy), for most uses, i.e. if you do not do it in a loop (similar rules as for Python itself).
From PyCall’s docs:
Multidimensional arrays exploit the NumPy array interface for conversions between Python and Julia. By default, they are passed from Julia to Python without making a copy, but from Python to Julia a copy is made; no-copy conversion of Python to Julia arrays can be achieved with the
PyArray
type below."
often on the way back you’re just getting much less data back so even without no-copy, might no be a problem.
[EDIT: The interface, is for dense arrays, not sure about sparse too, I doubt it.]
There was a discussion about “implementing numpy in Julia” a while ago which might be of interest, Stefan wrote a nice summary of why you couldn’t just “run Python in Julia” to speed things up back then: How hard would it be to implement Numpy.jl, i.e. Numpy in Julia? - #56 by StefanKarpinski
To summarize: With PyCall.jl there is a small overhead to make the call from Julia. After that, everything runs just as if you were using Python directly, because that’s all PyCall is doing, managing a Python runtime and giving you the result. Lots of Python is written in C and extension modules call out to C as well. Exactly the same binary code is executed when you call it from Julia.
In addition, PyCall sometimes automatically converts between Julia and Python data types, and offers some ways to control this. For example Dicts
are converted by default. This takes some time. But, you can have Julia and numpy share the buffer for, say, an array of Float64
to avoid copying.
As posters above have stated, the Python ecosystem includes well optimized routines written in C. Suppose you call them with lots of data, enough to make the call overhead negligible. You can probably beat them with pure Julia, but it might take a lot of work because you are competing with well engineered algorithms in an efficient, compiled, language. (Maybe this will get easier as Julia compiler technologies improve, ie. to automatically perform various parallelizations ?) I’ve seen people, even people who have coded a fair amount in Julia, sort of believe that Julia works like magic pixie dust that can make everything faster. In one case, this caused significant communication problems with people trying to evaluate whether to optimize Julia code or switch to a Python/numba implementation. Something written in Julia isn’t inherently faster or slower than something written in Python/numba/numpy/cython. The advantage of Julia is that, especially on the level of projects, it’s far easier to write efficient code.
Fortunately, with Julia and Python it’s pretty easy to experiment and benchmark.
Focusing just on the overhead issue:
You can use %timeit
at the ipython repl and @btime
at the Julia repl. For example, here you see a few hundred ns overhead. (In these examples there is little or no penalty for converting the data.)
julia> using PyCall
julia> @pyimport math
julia> using BenchmarkTools
julia> @btime math.sin(10.0)
253.915 ns (3 allocations: 48 bytes)
julia> @pyimport mpmath
julia> @btime mpmath.sin(10.0)
7.614 ÎĽs (3 allocations: 144 bytes)
In [1]: from math import sin
In [2]: %timeit sin(10.0)
41.7 ns ± 0.137 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [3]: import mpmath
In [4]: %timeit mpmath.sin(10.0)
7.22 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)