Using BenchmarkTools (as suggested by Stefan) and Julia v0.7-alpha (as suggested by Chris), I see Julia being only slightly slower than R:
Edit: And the very latest Julia nightlies are even faster than R (see below)
julia> using BenchmarkTools
julia> @btime sum(skipmissing($y))
1.329 s (6 allocations: 96 bytes)
> system.time(sum(x, na.rm = T))
user system elapsed
0.940 0.000 0.941
That’s actually pretty good, since presumably R’s sum()
is just calling a C function. Calling a single (C-implemented) function on a vector of data is pretty much the best possible case for R/NumPy/MATLAB/etc.
The important difference is that Julia’s sum(skipmissing(y))
is just calling Julia code all the way down. So if you want to implement your own kind of summation, you actually can and it will still be fast. Compare:
julia> function mysum(x)
result = zero(eltype(x))
for element in x
if !ismissing(element)
result += element
end
end
result
end
mysum (generic function with 1 method)
julia> @btime mysum($y)
809.128 ms (0 allocations: 0 bytes)
vs R:
> mysum <- function(x) {
+ result <- 0
+ for (element in x) {
+ if (!is.na(element)) {
+ result <- result + element
+ }
+ }
+ result
+ }
> system.time(mysum(x))
user system elapsed
107.094 0.000 107.093
The hand-written R
sum is over 100 times slower than Julia.
So yes, if all you ever want to do is compute sums over vectors, then R is fine. But if you want to do something that doesn’t have a fast built-in C implementation, then you really want Julia.
By the way,
Are sentinelle values to represent missing values really out of the question?
No, certainly not. Julia v0.7 stores vectors of things like Union{Float64, Missing}
efficiently by using a sentinel bit for each element. Or you can implement your own, which is what DataArrays does: https://github.com/JuliaStats/DataArrays.jl/blob/49004b5f82bb92ec7790805a0dda10b8c3aeb68b/src/dataarray.jl#L32-L70 and it will be fast, too.