Looping over previous row efficiency

yes, this will be slow. The issue is that accessing a data frame in this way is type-unstable. Julia doesn’t know the types of data frame columns inside hot loops and so can’t generate fast code.

The workaround is to use a function barrier. In general, for fast code with data frame, write a function which acts on vectors and then call that function on the columns you want.

Wait, also are you getting columns and rows confused? It looks like you are generating many columns, each a lag of the previous column. Usually this is done by rows…

EDIT: Sorry I did not read your code carefully enough. Try something like this

julia> df = DataFrame(rand(1000, 100), :auto);

julia> function get_lag(x)
           out = similar(x)
           out[1] = 0
           for row in 2:length(x)
               out[row] = log(x[row]) - log(x[row-1])
           end
           return out
       end;

julia> test = copy(df)
       for column in 1:ncol(df)
           test[!, column] = get_lag(df[!, column])
       end
1 Like