Looping over previous row efficiency

korilium · February 3, 2022, 7:22pm

So I have a dataset with all stock prices and i want to compute the returns.

I have done this the following way:

test = copy(dollar_portfolio)
returns = copy(dollar_portfolio[2:end,:])
@time for column in 2:length(dollar_portfolio[1,:])
    for row in 2:length(dollar_portfolio[:,1])
        test[row, column] = log(dollar_portfolio[row, column]) - log(dollar_portfolio[row-1,column]) 
    end 
    returns[:, column] = test[2:end, column]
end

I feel like this is not really good coding as I fill in the dataframe in a loop.
Does someone has a better idea current benchmark is the following:

  0.002446 seconds (37.77 k allocations: 692.391 KiB

thanks in advance

pdeffebach · February 3, 2022, 7:56pm

yes, this will be slow. The issue is that accessing a data frame in this way is type-unstable. Julia doesn’t know the types of data frame columns inside hot loops and so can’t generate fast code.

The workaround is to use a function barrier. In general, for fast code with data frame, write a function which acts on vectors and then call that function on the columns you want.

Wait, also are you getting columns and rows confused? It looks like you are generating many columns, each a lag of the previous column. Usually this is done by rows…

EDIT: Sorry I did not read your code carefully enough. Try something like this

julia> df = DataFrame(rand(1000, 100), :auto);

julia> function get_lag(x)
           out = similar(x)
           out[1] = 0
           for row in 2:length(x)
               out[row] = log(x[row]) - log(x[row-1])
           end
           return out
       end;

julia> test = copy(df)
       for column in 1:ncol(df)
           test[!, column] = get_lag(df[!, column])
       end

Topic		Replies	Views
How to speed up the for-loop with dataframe access Performance dataframes	25	1172	April 14, 2022
Fast iteration over rows of a DataFrame Performance	14	14165	June 30, 2020
Performance: Fast way to access numbers in Dataframes or alternatives Performance dataframes , data_structures	12	1182	November 15, 2022
Need for speed: looping over subdataframes to construct lags Performance question , dataframes	6	372	March 18, 2023
DataFrame transformation is so slow, what am I doing wrong? Performance compilation , dataframes	17	338	May 19, 2024

Looping over previous row efficiency

Related topics