Help with first time working on performance

performance

#1

I am new to Julia and this is my first time thinking about performance. I am working on speeding-up a function (code here). The speed I am getting is 0.011 seconds when I run

@time est_surv(times, is_censored);
  0.011156 seconds (70.24 k allocations: 1.792 MB)

I was expecting the speed to be about twice as fast. I started to read the official performance-tips. I then found an article by John Myles White on [DataFrame performance] (http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are-still-slow/). DataFrames are a large part of this function.

Can I squeeze a significant amount of performance out of this function or is the use of DataFrames the main bottle neck?


#2

Take a look at ProfileView.jl. It’ll give you a nice visual of how much time is spent in specific parts.


#3

A first try might be to convert the dataframe to julia arrays with the convert function:

mymat = convert(Matrix{T}, youDataFrame)
myvec = convert(Vector, yourDataFrame[:variablename])

and then let the function work on those instead. if the runtime is the same as before dataframes is not the bottleneck. --> {T} is the type of the matrix, e.g. Float64 or something fitting your specific data set.


#4

An order of magnitude faster!!

Thanks.

julia> @time est_surv(times, is_censored);
  0.001499 seconds (3.66 k allocations: 296.750 KB)

#5

At first glance, your code does a bunch of calculations that should be linear time but are instead quadratic time. For example, the following code makes length(t) passes over the times array:

nrisk::Array{Int64,1} = [count(i->(i>=j),times) for j in t]

but could work in a single pass over the array:

tsorted = sort(times)
nrisk = Array{Int}(length(t))
j = length(t)
nrisk[j] = 1
for i = length(times)-1:-1:1
    if tsorted[i] != tsorted[i+1]
        j -= 1
        nrisk[j] = nrisk[j+1]
    end
    nrisk[j] += 1
end

and

cumsum_delta::Array{Float64,1} = [sum(delta[1:i]) for i = 1:length(nrisk)]

is an O(n^2) algorithm that could be simply an O(n) call to the built-in cumsum function:

cumsum_delta = cumsum(delta)

Note also that pretty much all of your type declarations are basically useless for performance and just clutter the code. As long as you write type-stable code, Julia’s compiler will infer all of the types for you. Note that times = convert(Vector, times) is not type-stable; whenever you change the type, you should use a new variable name. (Use @code_warntype to make sure you haven’t made any type-stability mistakes, as described in the manual.)

Note also that you have lots of cases where you have a sequence of “vectorized” operations that could actually be merged into a single loop, eliminating several temporary arrays. For example:

log_log_var::Array{Float64,1} = [1/(log(km[i])^2)*cumsum_delta[i] for i = 1:length(km)]
log_log_sqrt::Array{Float64,1} = sqrt(log_log_var)
c_low::Array{Float64,1} = log(-log(km))-1.96*log_log_sqrt
c_high::Array{Float64,1} = log(-log(km))+1.96*log_log_sqrt
high::Array{Float64,1} = exp(-exp(c_low))
low::Array{Float64,1} = exp(-exp(c_high))

could be:

low = Array{Float64}(length(km))
high = Array{Float64}(length(km))
for i = 1:length(km)
    log_log_sqrt = 1.96 * sqrt(cumsum_delta[i] / log(km[i])^2)
    log_log_km = log(-log(km[i]))
    low[i] = exp(-exp(log_log_km - log_log_sqrt))
    high[i] = exp(-exp(log_log_km + log_log_sqrt))
end

In general, you have to unlearn some of the habits you might have learned from Matlab or Python that “built-in/vector functions = fast, loops = slow”.


#6

Thanks for your response! I saw more improvement after making the changes you suggested.

julia> for i in 1:10
           @time est_surv(times, is_censored);
       end
  0.001098 seconds (3.75 k allocations: 242.406 KB)
  0.000971 seconds (3.75 k allocations: 242.406 KB)
  0.001028 seconds (3.75 k allocations: 242.406 KB)
  0.000989 seconds (3.75 k allocations: 242.406 KB)
  0.001011 seconds (3.75 k allocations: 242.406 KB)
  0.001050 seconds (3.75 k allocations: 242.406 KB)
  0.001029 seconds (3.75 k allocations: 242.406 KB)
  0.001053 seconds (3.75 k allocations: 242.406 KB)
  0.001027 seconds (3.75 k allocations: 242.406 KB)
  0.001029 seconds (3.75 k allocations: 242.406 KB)

#7

By the way, I’d suggest using the https://github.com/JuliaCI/BenchmarkTools.jl package to time your function. It takes care of making sure the function is compiled and runs enough trials to get an accurate estimate of the minimum and average runtimes.