With Missings, Julia is slower than R

Note that until the fixes are merged, a relatively straightforward solution is to use sort!(collect(skipmissing(Y2)), alg=QuickSort). It’s about as fast as R and twice as fast as sort(Y2). It’s also what R does by default (na.last=NA), i.e. skipping NAs (keeping NAs doesn’t affect performance).

We should probably define sort(itr::SkipMissing) to call sort!(collect(itr), ...), since that’s a common need and it’s the most efficient possible definition.

2 Likes

Just want to check if missing values and sorting are faster in Julia than R now.

1 Like

It should be pretty easy to check this using @btime in Julia and the R library rbenchmark

2 Likes

This line of code no longer works.

That is with Julia 0.7. You are resurrecting a very old thread.

If you do ?replace you will see that the current syntax is

using Missings
x = allowmissing(rand(10_000))
y = replace(x-> x <= 0.01 ? x : missing, x)
1 Like

It doesn’t look to me like things have changed:

julia> Y1 = rand(Float64, 10_000_000);

julia> Y2 = ifelse.(rand(length(Y1)) .< 0.9, Y1, missing);

julia> @btime sort($Y1);
  757.621 ms (2 allocations: 76.29 MiB)

julia>  @btime sort($Y2);
  1.885 s (2 allocations: 85.83 MiB)
1 Like

I find this very concerning. Missing values are very common when working with real-world data.

For sure. This should definitely be fixed. In an issue linked above it seems people had a strategy of improving this.

oh wow this is so hacky I love it…
https://github.com/JuliaLang/julia/issues/27781#issuecomment-401110632

1 Like

To be honest, I think it is better to let new users understand that Julia is not always faster than R and Python. I keep seeing people getting frustrated about Julia when they find these cases.

I personally find that this community oversells Julia’s performance but undersells its syntax.

2 Likes

there are always multiple levels of understanding to this story. In this case, a relevant point is:

More importantly, in almost all cases, Julia can fix the problem by writing idiomatic Julia code, instead of MUST relying on C/C++/Fortran subroutine like you see in Python/R.

If you compare the time R/Python people spent in writing C/C++ code to the time someone in Julia community casually writing PRs to make a specific use faster, hopefully you’d see the qualitative difference and the long-term implication of this.

5 Likes