With Missings, Julia is slower than R

nalimilan · June 27, 2018, 8:47pm

Note that until the fixes are merged, a relatively straightforward solution is to use sort!(collect(skipmissing(Y2)), alg=QuickSort). It’s about as fast as R and twice as fast as sort(Y2). It’s also what R does by default (na.last=NA), i.e. skipping NAs (keeping NAs doesn’t affect performance).

We should probably define sort(itr::SkipMissing) to call sort!(collect(itr), ...), since that’s a common need and it’s the most efficient possible definition.

Yifan_Liu · February 25, 2021, 5:19pm

Just want to check if missing values and sorting are faster in Julia than R now.

pdeffebach · February 25, 2021, 5:48pm

It should be pretty easy to check this using @btime in Julia and the R library rbenchmark

Yifan_Liu · February 25, 2021, 6:41pm

This line of code no longer works.

pdeffebach · February 25, 2021, 7:13pm

That is with Julia 0.7. You are resurrecting a very old thread.

If you do ?replace you will see that the current syntax is

using Missings
x = allowmissing(rand(10_000))
y = replace(x-> x <= 0.01 ? x : missing, x)

nilshg · February 25, 2021, 7:15pm

It doesn’t look to me like things have changed:

julia> Y1 = rand(Float64, 10_000_000);

julia> Y2 = ifelse.(rand(length(Y1)) .< 0.9, Y1, missing);

julia> @btime sort($Y1);
  757.621 ms (2 allocations: 76.29 MiB)

julia>  @btime sort($Y2);
  1.885 s (2 allocations: 85.83 MiB)

Yifan_Liu · February 25, 2021, 11:46pm

I find this very concerning. Missing values are very common when working with real-world data.

pdeffebach · February 26, 2021, 12:01am

For sure. This should definitely be fixed. In an issue linked above it seems people had a strategy of improving this.

jling · February 26, 2021, 12:31am

oh wow this is so hacky I love it…
https://github.com/JuliaLang/julia/issues/27781#issuecomment-401110632

Yifan_Liu · February 26, 2021, 2:59am

To be honest, I think it is better to let new users understand that Julia is not always faster than R and Python. I keep seeing people getting frustrated about Julia when they find these cases.

I personally find that this community oversells Julia’s performance but undersells its syntax.

jling · February 26, 2021, 4:10am

there are always multiple levels of understanding to this story. In this case, a relevant point is:

More importantly, in almost all cases, Julia can fix the problem by writing idiomatic Julia code, instead of MUST relying on C/C++/Fortran subroutine like you see in Python/R.

If you compare the time R/Python people spent in writing C/C++ code to the time someone in Julia community casually writing PRs to make a specific use faster, hopefully you’d see the qualitative difference and the long-term implication of this.

Topic		Replies	Views
Performance of `Union{Missing,Float64}` General Usage question	14	1145	May 25, 2021
A few questions on Julia's missing values, and how they compare to Python and R New to Julia nan	3	622	February 25, 2021
Replacing missing values in a matrix is super slow General Usage question	38	1287	December 17, 2021
Summing matrix elements is >1000X slower than summing vector elements General Usage performance	8	1330	April 17, 2017
`Nulls.skip` is very slow Data	4	1076	October 15, 2017

With Missings, Julia is slower than R

Related topics