What Is a Simple Example of Julia Solving the Two Language Problem?

Hi folks,

I am giving a talk soon and something I want to highlight is the two language problem and how Julia addresses/solves this. Does anyone have a snippet of code that shows a simple Julia implementation of some function that can then be easily optimized (perhaps by types or other simple performance increases)? I would then use BenchmarkTools.jl to show the differences in timing – if that all makes sense.

Thanks folks!

~ tcp :deciduous_tree:


I am not quite finding it, but I always liked @stevengj material on the matter.

The “Generating Vandermonde matrices” in https://web.mit.edu/18.06/www/Fall17/1806/julia/Julia-intro.pdf as an example.

1 Like

I was thinking of julia-performance/1 - Is Julia fast?.ipynb at a1c77e92033c0ef3f58a360978ac2d3b08745ba8 · vchuravy/julia-performance · GitHub which I adapted from 18S096/Boxes-and-registers.ipynb at master · mitmath/18S096 · GitHub

1 Like

One thing where Julia clicked for me in this respect was loop fusion/broadcasting https://julialang.org/blog/2017/01/moredots/


Building on the mysum example, maybe it could be interesting to show how one can match the speed of the built-in by adding a simple decorator, and even surpass it by using LoopVectorization.jl, like in this notebook.

1 Like

I would say that the best showcase is to take an existing example of a two language solution, e.g. a Matlab mex file, port it to Julia and show that you can get the same speed as the mex file and can more easily optimize it further. But this is rarely simple and very target specific as the audience should be familiar with the two language solution in question.


by construction Two-Language problems aren’t likely to be simple, otherwise they’d just use the fast-lang in the two languages and come out fast in micro benchmarks


It’s really banal, but I would say that plotting capabilities are one of the prime candidates for why Julia is so good. Even if the problem is simple to implement in fast language like C or Fortran, in the end you will likely have to export a text file or something to read into Matlab/Python/R to see what happened.


How about: “NumPy’s indexing rules are implemented in C, but Julia’s are written in Julia”? base/abstractarrays.jl becomes your complete example. (You could, e.g., count lines of code and compare to relevant portions of NumPy.)


It is not easy to find examples that impress every audience. For some people I show this kind of thing: computing some relative property of elements in a list, but only for pairs in which one element satisfies one condition, and the other another condition. This makes it at least not completely trivial to implement a vectorized version of the code, and tries to convey that the more complex or specific the calculation is, the more important is the standard performance of the straightforward implementations in a language.

For example, let us build a data set Pets, “Cats” or “Dogs”, in which the data structure carries the type of the Pet and its weight. (We can show off by putting units to the weights). We will compute the minimum difference between the weight of any dog or cat in this list:

using Unitful

struct Pet{T} 

function min_diff_weight(pets)
    min_diff_weight = typemax(pets[1].weight)
    for i in 1:length(pets)-1
        if pets[i].name == "Dog"
            for j in i+1:length(pets)
                if pets[j].name == "Cat"
                    min_diff_weight = min(min_diff_weight, abs(pets[i].weight - pets[j].weight))
    return min_diff_weight

which takes for 10^4 pets:

julia> pets = [ Pet(rand(("Dog","Cat")), rand()u"kg") for _ in 1:10^4 ];

julia> @btime min_diff_weight($pets)
  147.122 ms (0 allocations: 0 bytes)
3.578299201389967e-8 kg

We did nothing sophisticated there, we literally wrote the double loop that does exactly and explicitly what we wanted.

Now that in Python:

import random

class Pet :
     def __init__(self,name,weight) :
         self.name = name
         self.weight = weight

def min_diff_weight(pets) :
    min_diff_weight = float('inf')
    for i in range(0,len(pets)-1) :
        if pets[i].name == "Dog" :
           for j in range(i+1,len(data)) :
               if pets[j].name == "Cat" :
                   min_diff_weight = min(min_diff_weight, abs(pets[i].weight - pets[j].weight))
    return min_diff_weight

The codes are similar (although I would say Julia syntax is prettier…). But in Python this takes:

In [10]: pets = [ Pet(random.choice(("Dog","Cat")), random.random()) for _ in range(0,10_000) ]

In [11]: %timeit min_diff_weight(pets)
3.07 s ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Thus, it is about 20 times slower.

It is clear than anyone can come up with faster versions of these codes, using other tools, vectorizing, adding flags, numba, jits, etc. But with a basic knowledge of Julia you can get close to optimal performance for custom tasks like this without having to use any external tool or having to rethink your problem to adapt to those tools. When the problems become complex, this becomes more and more important.


Does this matter? It’s not like I can realistically change them either way.

regrettably Julia plotting is way behind matplotlib. (not because matplotlib is good or anything, just because it’s been forever and people have covered every kind of plot, albeit lots of hacks like Artist object)


I guess that’s up to you. But from the standpoint of lines of code, flexibility, generality (we have countless types of AbstractArrays, NumPy has one), yes, it does matter.

Indeed, for basically the same reason I would consider Flux as a nice example … somehow it still strikes me as odd that Tensorflow.jl is (was?) one of the largest Julia projects on Github in terms of LoC, i.e., just to expose the Tensorflow API you need 100k lines whereas the whole code of Flux fits into much less (had read it some years ago when it was still using Tracker and needed just 2.5k lines). Somehow Julia feels like cheating here :wink:

1 Like

I think depending on the audience, the fact that you can write CUDA kernels in Julia via CUDA.jl is pretty nice.


The team that made the famous black hole image has found a lot of benefit for a Julia implemenation of what previously had separate Python and C++ implemetnations:


I take inspiration from a discussion on an R data.table function underway on Slack here, to provide an example of how an enthusiast like me (but absolutely not an expert in programming) manages to implement (with some effort) some ideas to obtain in Julia the same result obtained with a function available in R. (In fact, I don’t know what R’s performance is on tables equivalent to the ones I tested. And I’m sure you can still improve what I got.)
More precisely, the premise was this

Is there an equivalent to R’s data.table roll on join in DataFrames.jl?The idea is, considering a left join, have the right table value cols being carry forward up to the next matching key …

Below is the temporal development of the ideas with tests for measuring the results

first attempt using DataFrames functions
using Dates, DataFrames, BenchmarkTools

tbl1 = [(;date) for date in Date(2022,1,1):Day(1):Date(2022,1,5)]
tbl2 = [(date=Date(2022,1,2), var1=rand()), (date=Date(2022,1,5), var1=rand())]
tbl3 = [(date=Date(2022,1,1), var2=rand()), (date=Date(2022,1,3), var2=rand())]
df123=outerjoin(DF1,DF2, DF3, on=:date, order=:left)
filldown(x)=accumulate((p,f)->coalesce(f,p), x)
@btime transform(df123, [:var1, :var2].=>filldown.=>[:var1, :var2])
#   27.200 μs (258 allocations: 14.50 KiB)
# 5×3 DataFrame


@btime leftjoin(DF1,transform(sort(outerjoin(DF1,DF2, on=:date),:date),:var1=>filldown=>:var1), on=:date) 

a solution using the eclectic FlexiJoins package

tbl1 = [(;date) for date in Date(2000,1,1):Day(1):Date(2022,1,5)];
tbl2 = (date=[d for d in Date(2000,1,1):Day(1):Date(2022,1,5)], var=rand(8041));
df1=DataFrame(deleteat!(tbl1, sort(unique(rand(1:7041,1000)))));
df2=deleteat!(DataFrame(tbl2), sort(unique(rand(1:7041,4000))));

using FlexiJoins
outFJ=last.(FlexiJoins.innerjoin((A=tbl1, B=t2), by_pred(:date, ≥, :date), multi=(B=closest,)).B)
a solution using only basic functions
roll_ssl(c1,c2)=[searchsortedlast((c2), k) for k in c1]

@btime begin
    df1.var[1:si-1] .= NaN
# 243.500 μs (16 allocations: 222.64 KiB)
# 7109-element Vector{Float64}:

finally a solution using only for/while loop, which, with some effort, allows to obtain an improvement of at least 10 times


function roll(c1,c2)
    for i in li:length(c1)
        if (j == length(c2))
            if c1[i] < c2[j+1] 
                while (j < length(c2)) && (c2[j+1]<=c1[i])

@btime begin
    df1.var[1:si-1] .= NaN
# 26.300 μs (16 allocations: 222.64 KiB)
# 7109-element Vector{Float64}:

It would be a lot easier to show instances where the two-language problem is a problem, like with python and Spark or Flink, but this doesn’t really answer the question as I don’t see Julia as a drop-in replacement for these tools.

Julia could of course fit these domains with more work, but Rust-based solutions are further, and they have plenty of momentum.

As others have said, the solved “two language oroblem” are by definition untrivisl cases.

What you can do is show screenshots of the github pages with the charts for Numpy, pandas, scikit-leatn of the language they are written in… and then show the same beautiful “100% Julia” charts for DataFrames, CSV, Flux …

Enzyme is:

Unlike (what eventually would be?) Rust Enzyme, Julia is purely a front end of Enzyme, so it’s really a two-language solution in that regard.

Like, in this particular case, if C++ adopts Enzyme, it would still be 1 language solution and presumably performance will be better.