Looping over previous row efficiency

So I have a dataset with all stock prices and i want to compute the returns.

I have done this the following way:

test = copy(dollar_portfolio)
returns = copy(dollar_portfolio[2:end,:])
@time for column in 2:length(dollar_portfolio[1,:])
    for row in 2:length(dollar_portfolio[:,1])
        test[row, column] = log(dollar_portfolio[row, column]) - log(dollar_portfolio[row-1,column]) 
    returns[:, column] = test[2:end, column]

I feel like this is not really good coding as I fill in the dataframe in a loop.
Does someone has a better idea current benchmark is the following:

  0.002446 seconds (37.77 k allocations: 692.391 KiB

thanks in advance

yes, this will be slow. The issue is that accessing a data frame in this way is type-unstable. Julia doesn’t know the types of data frame columns inside hot loops and so can’t generate fast code.

The workaround is to use a function barrier. In general, for fast code with data frame, write a function which acts on vectors and then call that function on the columns you want.

Wait, also are you getting columns and rows confused? It looks like you are generating many columns, each a lag of the previous column. Usually this is done by rows…

EDIT: Sorry I did not read your code carefully enough. Try something like this

julia> df = DataFrame(rand(1000, 100), :auto);

julia> function get_lag(x)
           out = similar(x)
           out[1] = 0
           for row in 2:length(x)
               out[row] = log(x[row]) - log(x[row-1])
           return out

julia> test = copy(df)
       for column in 1:ncol(df)
           test[!, column] = get_lag(df[!, column])
1 Like