Help with first time working on performance



I am new to Julia and this is my first time thinking about performance. I am working on speeding-up a function (code here). The speed I am getting is 0.011 seconds when I run

@time est_surv(times, is_censored);
  0.011156 seconds (70.24 k allocations: 1.792 MB)

I was expecting the speed to be about twice as fast. I started to read the official performance-tips. I then found an article by John Myles White on [DataFrame performance] ( DataFrames are a large part of this function.

Can I squeeze a significant amount of performance out of this function or is the use of DataFrames the main bottle neck?


Take a look at ProfileView.jl. It’ll give you a nice visual of how much time is spent in specific parts.


A first try might be to convert the dataframe to julia arrays with the convert function:

mymat = convert(Matrix{T}, youDataFrame)
myvec = convert(Vector, yourDataFrame[:variablename])

and then let the function work on those instead. if the runtime is the same as before dataframes is not the bottleneck. --> {T} is the type of the matrix, e.g. Float64 or something fitting your specific data set.


An order of magnitude faster!!


julia> @time est_surv(times, is_censored);
  0.001499 seconds (3.66 k allocations: 296.750 KB)


At first glance, your code does a bunch of calculations that should be linear time but are instead quadratic time. For example, the following code makes length(t) passes over the times array:

nrisk::Array{Int64,1} = [count(i->(i>=j),times) for j in t]

but could work in a single pass over the array:

tsorted = sort(times)
nrisk = Array{Int}(length(t))
j = length(t)
nrisk[j] = 1
for i = length(times)-1:-1:1
    if tsorted[i] != tsorted[i+1]
        j -= 1
        nrisk[j] = nrisk[j+1]
    nrisk[j] += 1


cumsum_delta::Array{Float64,1} = [sum(delta[1:i]) for i = 1:length(nrisk)]

is an O(n^2) algorithm that could be simply an O(n) call to the built-in cumsum function:

cumsum_delta = cumsum(delta)

Note also that pretty much all of your type declarations are basically useless for performance and just clutter the code. As long as you write type-stable code, Julia’s compiler will infer all of the types for you. Note that times = convert(Vector, times) is not type-stable; whenever you change the type, you should use a new variable name. (Use @code_warntype to make sure you haven’t made any type-stability mistakes, as described in the manual.)

Note also that you have lots of cases where you have a sequence of “vectorized” operations that could actually be merged into a single loop, eliminating several temporary arrays. For example:

log_log_var::Array{Float64,1} = [1/(log(km[i])^2)*cumsum_delta[i] for i = 1:length(km)]
log_log_sqrt::Array{Float64,1} = sqrt(log_log_var)
c_low::Array{Float64,1} = log(-log(km))-1.96*log_log_sqrt
c_high::Array{Float64,1} = log(-log(km))+1.96*log_log_sqrt
high::Array{Float64,1} = exp(-exp(c_low))
low::Array{Float64,1} = exp(-exp(c_high))

could be:

low = Array{Float64}(length(km))
high = Array{Float64}(length(km))
for i = 1:length(km)
    log_log_sqrt = 1.96 * sqrt(cumsum_delta[i] / log(km[i])^2)
    log_log_km = log(-log(km[i]))
    low[i] = exp(-exp(log_log_km - log_log_sqrt))
    high[i] = exp(-exp(log_log_km + log_log_sqrt))

In general, you have to unlearn some of the habits you might have learned from Matlab or Python that “built-in/vector functions = fast, loops = slow”.


Thanks for your response! I saw more improvement after making the changes you suggested.

julia> for i in 1:10
           @time est_surv(times, is_censored);
  0.001098 seconds (3.75 k allocations: 242.406 KB)
  0.000971 seconds (3.75 k allocations: 242.406 KB)
  0.001028 seconds (3.75 k allocations: 242.406 KB)
  0.000989 seconds (3.75 k allocations: 242.406 KB)
  0.001011 seconds (3.75 k allocations: 242.406 KB)
  0.001050 seconds (3.75 k allocations: 242.406 KB)
  0.001029 seconds (3.75 k allocations: 242.406 KB)
  0.001053 seconds (3.75 k allocations: 242.406 KB)
  0.001027 seconds (3.75 k allocations: 242.406 KB)
  0.001029 seconds (3.75 k allocations: 242.406 KB)


By the way, I’d suggest using the package to time your function. It takes care of making sure the function is compiled and runs enough trials to get an accurate estimate of the minimum and average runtimes.