With Missings, Julia is slower than R

gaspardelanuit · June 21, 2018, 3:17am

As shown by the code below, Julia is twice slower than R (!) to sum elements of a Float64 vector with missing values, even with Julia 0.7.
In R, missing values are encoded with sentinelle values. In Julia, they are encoded with different type.
This difference of design makes it much slower to manipulate vectors of Float64 with missing s n Julia.
Are sentinelle values to represent missing values really out of the question?

# in Julia
using Missings
x = rand(200_000_000)
y = replace(x-> x<= 0.01, x, missing)
@time sum(skipmissing(y))

# in R
x = runif(2 * 1e8)
x[x<=0.01] = NA
system.time(sum(x, na.rm = T))
``

StefanKarpinski · June 21, 2018, 3:19am

Don’t benchmark in global scope (performance tips)
Use BenchmarkTools for benchmarking
The first run includes compile time

ChrisRackauckas · June 21, 2018, 3:20am

The issue here is likely using Missing. If you have to do that, then you’re not on Julia v0.7-alpha. The optimizations to make this fast only exist on the newest alpha version.

rdeits · June 21, 2018, 4:20am

Using BenchmarkTools (as suggested by Stefan) and Julia v0.7-alpha (as suggested by Chris), I see Julia being only slightly slower than R:

Edit: And the very latest Julia nightlies are even faster than R (see below)

julia> using BenchmarkTools

julia> @btime sum(skipmissing($y))
  1.329 s (6 allocations: 96 bytes)

> system.time(sum(x, na.rm = T))
   user  system elapsed 
  0.940   0.000   0.941

That’s actually pretty good, since presumably R’s sum() is just calling a C function. Calling a single (C-implemented) function on a vector of data is pretty much the best possible case for R/NumPy/MATLAB/etc.

The important difference is that Julia’s sum(skipmissing(y)) is just calling Julia code all the way down. So if you want to implement your own kind of summation, you actually can and it will still be fast. Compare:

julia> function mysum(x)
         result = zero(eltype(x))
         for element in x
           if !ismissing(element)
             result += element
           end
         end
         result
       end
mysum (generic function with 1 method)

julia> @btime mysum($y)
  809.128 ms (0 allocations: 0 bytes)

vs R:

> mysum <- function(x) {
+   result <- 0
+   for (element in x) {
+     if (!is.na(element)) {
+       result <- result + element
+     }
+   }
+  result
+ }
> system.time(mysum(x))
   user  system elapsed 
107.094   0.000 107.093

The hand-written R sum is over 100 times slower than Julia.

So yes, if all you ever want to do is compute sums over vectors, then R is fine. But if you want to do something that doesn’t have a fast built-in C implementation, then you really want Julia.

By the way,

Are sentinelle values to represent missing values really out of the question?

No, certainly not. Julia v0.7 stores vectors of things like Union{Float64, Missing} efficiently by using a sentinel bit for each element. Or you can implement your own, which is what DataArrays does: https://github.com/JuliaStats/DataArrays.jl/blob/49004b5f82bb92ec7790805a0dda10b8c3aeb68b/src/dataarray.jl#L32-L70 and it will be fast, too.

ChrisRackauckas · June 21, 2018, 4:27am

Why is your sum implementation so much faster than the one with skipmissing though? What’s going on in the skipmissing part that’s slowing it down? I assumed it was just an iterator that skipped over the missing values which should be almost equivalent to what you have?

rdeits · June 21, 2018, 4:31am

Yeah, I was surprised by that as well. It also allocates some memory, even if I omit the skipmissing() itself:

julia> @btime sum($(skipmissing(y)))
  1.323 s (5 allocations: 80 bytes)

It looks like the implementation of SkipMissing iteration hasn’t been updated to use the new iterate() protocol as of v0.7-alpha, but it has been updated on master. Perhaps this will be faster on master.

rdeits · June 21, 2018, 4:38am

Holy cow! Yeah, it’s 4 times faster (and more than 2X faster than R) on the very latest Julia nightly:

julia> versioninfo()
Julia Version 0.7.0-alpha.214
Commit ffc9182b3a (2018-06-20 19:45 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3920XM CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)

julia> @btime sum(skipmissing($y))
  381.473 ms (1 allocation: 16 bytes)

I’m impressed!

gaspardelanuit · June 21, 2018, 6:41am

Hi,
This does not change anything. I obtain these results in v0.7, even with BenchmarkTools and using local scope. See below.
To be more precise, I obtain 0.5s in Julia vs 0.2s in R

using BenchmarkTools, Compat
x = rand(200_000_000)
y = replace(x-> x<= 0.01, x, missing)
function f(y)
	@btime sum(skipmissing(y))
end
f(y)

versioninfo()
Julia Version 0.7.0-alpha.0
Commit 22590d529d (2018-05-31 00:07 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

nalimilan · June 21, 2018, 6:50am

See the two last sections of my blog post. We get an incredibly high performance on recent git master for a manual sum over Vector{Int}, a similar performance as R for a manual sum over Vector{Float64}, but a somewhat poorer performance using skipmissing. I’ve filed this issue about it yesterday.

Anyway, as @rdeits has noted, R’s sum is implemented in C, so being only twice slower than it in pure Julia is already quite good. If you write a sum in pure R, it will be much slower than that! More fundamentally, the fact that writing the loop manually gives a better performance means that we should be able to match R relatively easily, and probably beat it at least in some cases.

tkoolen · June 21, 2018, 6:57am

y is still a non-const global variable. See https://docs.julialang.org/en/stable/manual/performance-tips/#Avoid-global-variables-1 and compare to how rdeits is doing the benchmarking in With Missings, Julia is slower than R - #4 by rdeits.

DNF · June 21, 2018, 7:58am

I thought that your use of zero(eltype(x)) would be problematic, but apparently zero(Union{Missing, T}) is equivalent to zero(T).

What is the general rule governing this? (Edit: promote_type, maybe?)

StefanKarpinski · June 21, 2018, 8:01am

The general rule is that missing behaves as if it were some value in the domain T but you don’t know which one. So zero(Union{Missing, T}) = zero(T) makes sense since the zero of a missing T value is just the zero of T.

gaspardelanuit · June 25, 2018, 8:09pm

Thanks to nalimilan for being on this already!

Same issue happens with sort though. Julia takes 1.5 R to sort a Float64 vector without missing values, but this jumps to 3.5 R to sort a Float64 vector with missing values. Do you think there will be a way to make sorting roughly as fast as R?

For @rdeits, I understand that Julia is much more flexible than R, but Julia is also supposed to be a more performant language. Given the benchmarks that are on the first page on the Julia website, it is reasonable to expect Julia to be roughly on par with R functions, even the ones coded in C. More importantly, making sure that basic library functions such as sum or sort are efficient for the case of Float64 is essential, because these functions will be called very frequently.

rdeits · June 25, 2018, 8:18pm

Can you post (a) the Julia and R code you’re using, (b) the actual results, and (c) the exact versions of both? Without that information it’s impossible to say anything at all.

gaspardelanuit · June 25, 2018, 8:22pm

# Julia Version 0.7.0-beta.9
Y1 = rand(Float64, 10_000_000);
Y2 = ifelse.(rand(length(Y1)) .< 0.9, Y1, missing);
 @btime sort($Y1)
#>   764.978 ms (2 allocations: 76.29 MiB)
 @btime sort($Y2)
#>  1.797 s (4 allocations: 128.75 MiB)

# in R version 3.5.0 (2018-04-23)
Y1 = rnorm(10000000)
Y2 = ifelse(runif(length(Y1)) < 0.9, Y1, NA)
system.time(sort(Y1))
#>    user  system elapsed 
#>  0.616   0.043   0.670
system.time(sort(Y2))
#>   user  system elapsed 
#>  0.598   0.038   0.640

rdeits · June 25, 2018, 8:51pm

I get essentially no difference between Julia v0.7.0-beta.5 and R 3.4.4 when sorting Float64s on my 2018 i7 laptop:

julia> @btime sort(y) setup=(y = rand(Float64, 10_000_000))
  906.928 ms (2 allocations: 76.29 MiB)

(I’m putting the generation of y into the setup to also randomize the initial sorted-ness of the data, which substantially affects run-time).

> Y1 = rnorm(10000000)
^[system.time(sort(Y1))
   user  system elapsed 
  0.963   0.060   1.047

although I do see the missing version being substantially slower in Julia:

julia> @btime sort(y2) setup=(y = rand(Float64, 10_000_000); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing))
  2.186 s (4 allocations: 128.75 MiB)

I think this actually indicates a real performance bug. Julia has an in-place sort which does not allocate memory:

julia> @btime sort!(y) setup=(y = rand(100)) evals=1
  2.414 μs (0 allocations: 0 bytes)

and the default sort(x) just copies x and then does sort!(x_copy). However, that in-place sort seems to allocate memory when it shouldn’t for arrays with Missings:

julia> @btime sort!(y2) setup=(y = rand(100); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing)) evals=1
  4.157 μs (2 allocations: 624 bytes)

That could be a real (and fixable) performance bug.

nalimilan · June 25, 2018, 9:21pm

Good catch! Looking at the time profile and at allocations, I couldn’t find anything obvious. It would be worth filing an issue.

Keep up testing performance! It’s hard to think about all possible tests.

StefanKarpinski · June 25, 2018, 9:23pm

Is it possible that it’s not using the floating-point quicksort code and is using a merge sort instead?

rdeits · June 25, 2018, 9:34pm

x-ref https://github.com/JuliaLang/julia/issues/27781

GunnarFarneback · June 25, 2018, 9:38pm

Seems like it.

julia> Base.Sort.defalg(y2)
Base.Sort.MergeSortAlg()

julia> @btime sort(y2) setup=(y = rand(Float64, 10_000_000); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing));
  1.635 s (4 allocations: 128.75 MiB)

julia> @btime sort(y2, alg=MergeSort) setup=(y = rand(Float64, 10_000_000); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing));
  1.638 s (5 allocations: 128.75 MiB)

julia> @btime sort(y2, alg=QuickSort) setup=(y = rand(Float64, 10_000_000); y2 = ifelse.(rand(length(y)) .< 0.9, y, missing));
  1.432 s (3 allocations: 85.83 MiB)

Topic		Replies	Views
Missing or NaN General Usage	26	12446	August 1, 2018
Fastest way to count unique elements in `Vector{Union{Bool, Missing})` Performance	35	6994	January 16, 2018
Ironic observation about `sort` and `sortperm` speed for "small integers" vs R Performance sort , sortperm , r	32	4787	February 4, 2018
Why I love Julia Community announcement	34	4925	October 11, 2019
What features will I miss in Julia? New to Julia	88	13570	November 19, 2018

With Missings, Julia is slower than R

Related topics