Iterating over row in a DataFrame

Hello,

I wish I could implement one of my Python models on Julia, but have been stuck for hours on the basic iteration problem in the context of the Julia language.

Basically, I just want to iterate over each row of my DataFrame

#Step 1: declaration of endogenous variables
columnnames = ["A","B"]
T = 100
columns = [Symbol(col) => zeros(T) for col in columnnames]
y = DataFrame(columns...)
#I am launching my iteration
for t in 1:T
          if t == 0
#Step 2: Initial values are assigned
                y[1] = 1
          else
#Step 3: equations
                y[t] = y[t-1] + 1

No matter how hard I search through the different tutorials, I can’t find the solution to do this simple approach on Julia.
I tried the following solution in particular:

Even the first step to replace the first line with a value doesn’t work…

y[:1,:] = 1.0

ERROR: MethodError: no method matching setindex!(::DataFrame, ::Float64, ::Int64, ::UnitRange{Int64})

Would you have a suggestion in my research please?

Best regards,

Thomas

Seems to be very easy but your problem description seems to be overly complicated.

Even if I think the the answer you are looking for is very simple (I would give it if I would be sure which one it is), I think it is best if you first go through this

and ask again.

Or, just skip the python and Country Code stuff and just ask what you want to do as a first step in Julia. We can go step by step until you are on the road…

1 Like

Hello,

Of course, sorry for this and thank you for the recommandations. I’ve tried to reedit my question.

Well done.
Here is a slightly more Julia style version of the iteration (only changing column “A”). This part is still unclear, which column you want to change:

using DataFrames
t = 100
y = DataFrame("A"=>ones(t),"B"=>zeros(t))
for t in 2:t
    y[t,1] = y[t-1,1] + 1
end

I am not going into high efficiency, just more tutorial style and easy to comprehend.
Note: Uppercase is style for types, variables should be lowercase starting.

1 Like

Thank you for your answer. Sorry for my unclear question.

Indeed, I would like to apply operations or assignments to all columns, such as:

using DataFrames
n = 100
y = DataFrame("A"=>zeros(n),"B"=>zeros(n))
for t in 1:n
    if t == 1
        y[t,1:end] = 1
    else
        y[t,1:end] = y[t-1,1:end] + 1
    end
end

However, trying this I’ve got the following error:

ERROR: MethodError: no method matching setindex!(::DataFrame, ::Float64, ::Int64, ::UnitRange{Int64})

Thank you for your help !

The Julia style solution would be broadcasting, but unfortunately this is currently not implemented over DataFrameRow. I found this discussion about this: julia - Is there a way to subtract multiple dataframe columns at once? - Stack Overflow

It would look like:

using DataFrames
n = 100
y = DataFrame("A"=>zeros(n),"B"=>zeros(n))
for t in 1:n
    if t == 1
        y[t,1:end] .= 1
    else
        y[t,1:end] .= ( y[t-1,1:end] .+ 1 )
    end
end

Which gives the error:

ERROR: ArgumentError: broadcasting over `DataFrameRow`s is reserved

For broadcast in general see: Multi-dimensional Arrays · The Julia Language

The workaround (from above discussion) is:

using DataFrames
n = 100
y = DataFrame("A"=>zeros(n),"B"=>zeros(n))
for t in 1:n
    if t == 1
        y[t,1:end] .= 1
    else
        y[t,1:end] .= ( Vector(y[t-1,1:end])  .+ 1 )
    end
end

But I am not happy with this code. Depending on your real goal it is probably better just to do the processing for each column separately, as the columns seem to be independent from each other (but as I said, real peformance implementation needs the complete problem to know).

This is better because Julia arrays are column-major, see
https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-column-major

1 Like

This is indeed a sub-optimal scenario, but your code looks good.

The reason you have to convert to vectors is because a DataFrameRow tries to have a very similar API as a NamedTuple. NamedTuples currently do not support this kind of broadcasting, and we want to match that behavior for whatever they do eventually decide to do with broadcasting.

2 Likes

I would encourage you to post a more complete description of what you’re actually trying to achieve in order to avoid the danger of causing an XY problem.

In particular, it feels to me like a DataFrame isn’t necessarily the right data structure for your use case - just because something was done in pandas doesn’t mean it has to be a DataFrame in Julia! You might be better off with a simple Array{Float64, 2}, or maybe a NamedArray, or one of the many other low- or zero cost abstractions the Julia language offers to organise your data & algorithm.

3 Likes

Thank you all for your answers.

Indeed, NamedArrays.jl does the job I need:

using NamedArrays

columnsnames = ["A","B"]
c = length(columnsnames)
n = 100
years = zeros(n)
start_date = 2020
years[1] = start_date

for t in 2:n
    years[t] = years[t-1] + 1
end

y = NamedArray((zeros(n,c)), (years, columnsnames)) 

for t in 1:n
    if t == 1
        y[t,1:end] .= 1
    else
        y[t,1:end] .= y[t-1,1:end] .+ 1
    end
end

println(y)