So I have a dataset with all stock prices and i want to compute the returns.
I have done this the following way:
test = copy(dollar_portfolio)
returns = copy(dollar_portfolio[2:end,:])
@time for column in 2:length(dollar_portfolio[1,:])
for row in 2:length(dollar_portfolio[:,1])
test[row, column] = log(dollar_portfolio[row, column]) - log(dollar_portfolio[row-1,column])
end
returns[:, column] = test[2:end, column]
end
I feel like this is not really good coding as I fill in the dataframe in a loop.
Does someone has a better idea current benchmark is the following:
0.002446 seconds (37.77 k allocations: 692.391 KiB
thanks in advance
yes, this will be slow. The issue is that accessing a data frame in this way is type-unstable. Julia doesn’t know the types of data frame columns inside hot loops and so can’t generate fast code.
The workaround is to use a function barrier. In general, for fast code with data frame, write a function which acts on vectors and then call that function on the columns you want.
Wait, also are you getting columns and rows confused? It looks like you are generating many columns, each a lag of the previous column. Usually this is done by rows…
EDIT: Sorry I did not read your code carefully enough. Try something like this
julia> df = DataFrame(rand(1000, 100), :auto);
julia> function get_lag(x)
out = similar(x)
out[1] = 0
for row in 2:length(x)
out[row] = log(x[row]) - log(x[row-1])
end
return out
end;
julia> test = copy(df)
for column in 1:ncol(df)
test[!, column] = get_lag(df[!, column])
end
1 Like