Iterating over row in a DataFrame

tlorans · December 11, 2020, 12:30pm

Hello,

I wish I could implement one of my Python models on Julia, but have been stuck for hours on the basic iteration problem in the context of the Julia language.

Basically, I just want to iterate over each row of my DataFrame

#Step 1: declaration of endogenous variables
columnnames = ["A","B"]
T = 100
columns = [Symbol(col) => zeros(T) for col in columnnames]
y = DataFrame(columns...)
#I am launching my iteration
for t in 1:T
          if t == 0
#Step 2: Initial values are assigned
                y[1] = 1
          else
#Step 3: equations
                y[t] = y[t-1] + 1

No matter how hard I search through the different tutorials, I can’t find the solution to do this simple approach on Julia.
I tried the following solution in particular:

Even the first step to replace the first line with a value doesn’t work…

y[:1,:] = 1.0

ERROR: MethodError: no method matching setindex!(::DataFrame, ::Float64, ::Int64, ::UnitRange{Int64})

Would you have a suggestion in my research please?

Best regards,

Thomas

oheil · December 11, 2020, 12:52pm

Seems to be very easy but your problem description seems to be overly complicated.

Even if I think the the answer you are looking for is very simple (I would give it if I would be sure which one it is), I think it is best if you first go through this

and ask again.

Or, just skip the python and Country Code stuff and just ask what you want to do as a first step in Julia. We can go step by step until you are on the road…

tlorans · December 11, 2020, 1:07pm

Hello,

Of course, sorry for this and thank you for the recommandations. I’ve tried to reedit my question.

oheil · December 11, 2020, 1:17pm

Well done.
Here is a slightly more Julia style version of the iteration (only changing column “A”). This part is still unclear, which column you want to change:

using DataFrames
t = 100
y = DataFrame("A"=>ones(t),"B"=>zeros(t))
for t in 2:t
    y[t,1] = y[t-1,1] + 1
end

I am not going into high efficiency, just more tutorial style and easy to comprehend.
Note: Uppercase is style for types, variables should be lowercase starting.

tlorans · December 11, 2020, 1:27pm

Thank you for your answer. Sorry for my unclear question.

Indeed, I would like to apply operations or assignments to all columns, such as:

using DataFrames
n = 100
y = DataFrame("A"=>zeros(n),"B"=>zeros(n))
for t in 1:n
    if t == 1
        y[t,1:end] = 1
    else
        y[t,1:end] = y[t-1,1:end] + 1
    end
end

However, trying this I’ve got the following error:

ERROR: MethodError: no method matching setindex!(::DataFrame, ::Float64, ::Int64, ::UnitRange{Int64})

Thank you for your help !

oheil · December 11, 2020, 2:54pm

The Julia style solution would be broadcasting, but unfortunately this is currently not implemented over DataFrameRow. I found this discussion about this: julia - Is there a way to subtract multiple dataframe columns at once? - Stack Overflow

It would look like:

using DataFrames
n = 100
y = DataFrame("A"=>zeros(n),"B"=>zeros(n))
for t in 1:n
    if t == 1
        y[t,1:end] .= 1
    else
        y[t,1:end] .= ( y[t-1,1:end] .+ 1 )
    end
end

Which gives the error:

ERROR: ArgumentError: broadcasting over `DataFrameRow`s is reserved

For broadcast in general see: Multi-dimensional Arrays · The Julia Language

The workaround (from above discussion) is:

using DataFrames
n = 100
y = DataFrame("A"=>zeros(n),"B"=>zeros(n))
for t in 1:n
    if t == 1
        y[t,1:end] .= 1
    else
        y[t,1:end] .= ( Vector(y[t-1,1:end])  .+ 1 )
    end
end

But I am not happy with this code. Depending on your real goal it is probably better just to do the processing for each column separately, as the columns seem to be independent from each other (but as I said, real peformance implementation needs the complete problem to know).

This is better because Julia arrays are column-major, see
https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-column-major

pdeffebach · December 11, 2020, 5:50pm

This is indeed a sub-optimal scenario, but your code looks good.

The reason you have to convert to vectors is because a DataFrameRow tries to have a very similar API as a NamedTuple. NamedTuples currently do not support this kind of broadcasting, and we want to match that behavior for whatever they do eventually decide to do with broadcasting.

nilshg · December 11, 2020, 7:30pm

I would encourage you to post a more complete description of what you’re actually trying to achieve in order to avoid the danger of causing an XY problem.

In particular, it feels to me like a DataFrame isn’t necessarily the right data structure for your use case - just because something was done in pandas doesn’t mean it has to be a DataFrame in Julia! You might be better off with a simple Array{Float64, 2}, or maybe a NamedArray, or one of the many other low- or zero cost abstractions the Julia language offers to organise your data & algorithm.

tlorans · December 12, 2020, 3:19pm

Thank you all for your answers.

Indeed, NamedArrays.jl does the job I need:

using NamedArrays

columnsnames = ["A","B"]
c = length(columnsnames)
n = 100
years = zeros(n)
start_date = 2020
years[1] = start_date

for t in 2:n
    years[t] = years[t-1] + 1
end

y = NamedArray((zeros(n,c)), (years, columnsnames)) 

for t in 1:n
    if t == 1
        y[t,1:end] .= 1
    else
        y[t,1:end] .= y[t-1,1:end] .+ 1
    end
end

println(y)

Topic		Replies	Views
Iterating over a DataFrame New to Julia iterative , dataframes , function	2	715	May 26, 2021
Iterate over all columns in a DataFrame New to Julia dataframes	3	3079	May 24, 2021
Is there an equivalent of eachindex() for DataFrames? General Usage question , dataframes , type-stability	13	1367	October 21, 2022
Creating columns in DataFrame via loops New to Julia question , dataframes	3	115	June 27, 2025
Mutate a new variable with row numbers Data	4	1385	November 12, 2019

Iterating over row in a DataFrame

Related topics