At first glance, your code does a bunch of calculations that should be linear time but are instead quadratic time. For example, the following code makes length(t)
passes over the times
array:
nrisk::Array{Int64,1} = [count(i->(i>=j),times) for j in t]
but could work in a single pass over the array:
tsorted = sort(times)
nrisk = Array{Int}(length(t))
j = length(t)
nrisk[j] = 1
for i = length(times)-1:-1:1
if tsorted[i] != tsorted[i+1]
j -= 1
nrisk[j] = nrisk[j+1]
end
nrisk[j] += 1
end
and
cumsum_delta::Array{Float64,1} = [sum(delta[1:i]) for i = 1:length(nrisk)]
is an O(n^2) algorithm that could be simply an O(n) call to the built-in cumsum
function:
cumsum_delta = cumsum(delta)
Note also that pretty much all of your type declarations are basically useless for performance and just clutter the code. As long as you write type-stable code, Julia’s compiler will infer all of the types for you. Note that times = convert(Vector, times)
is not type-stable; whenever you change the type, you should use a new variable name. (Use @code_warntype
to make sure you haven’t made any type-stability mistakes, as described in the manual.)
Note also that you have lots of cases where you have a sequence of “vectorized” operations that could actually be merged into a single loop, eliminating several temporary arrays. For example:
log_log_var::Array{Float64,1} = [1/(log(km[i])^2)*cumsum_delta[i] for i = 1:length(km)]
log_log_sqrt::Array{Float64,1} = sqrt(log_log_var)
c_low::Array{Float64,1} = log(-log(km))-1.96*log_log_sqrt
c_high::Array{Float64,1} = log(-log(km))+1.96*log_log_sqrt
high::Array{Float64,1} = exp(-exp(c_low))
low::Array{Float64,1} = exp(-exp(c_high))
could be:
low = Array{Float64}(length(km))
high = Array{Float64}(length(km))
for i = 1:length(km)
log_log_sqrt = 1.96 * sqrt(cumsum_delta[i] / log(km[i])^2)
log_log_km = log(-log(km[i]))
low[i] = exp(-exp(log_log_km - log_log_sqrt))
high[i] = exp(-exp(log_log_km + log_log_sqrt))
end
In general, you have to unlearn some of the habits you might have learned from Matlab or Python that “built-in/vector functions = fast, loops = slow”.