Iterating through rows in julia Dataframe

ayodyas · July 19, 2022, 4:05pm

df=DataFrame(p1 = [1, 2, 3, 4, 5],p1 = [10,20,30,40,50])
df2=DataFrame(q1 = [1, 12, 13,14, 15],q2 = [100,200,300,400,500])
datas=Dict()

for data1 in eachrow(df2)
for data in eachrow(df)

    print("H",data[1]," " ,data[2]," ")
    if data[1]==data1[1]
          datas[data[1]]=data[2]+data1[2]

           print("aaaaaaaaaa")

    else
    datas[data[1]]=5000

    end

end
end

eventhough print us working but its not appending to the dict

Is there something did I wrong

nilshg · July 19, 2022, 4:13pm

Your code doesn’t run because you have two columns named p1 in your first DataFrame. Fixing this and running this code:

using DataFrames
df=DataFrame(p1 = [1, 2, 3, 4, 5], p2 = [10,20,30,40,50])
df2=DataFrame(q1 = [1, 12, 13,14, 15], q2 = [100,200,300,400,500])
datas=Dict()

for data1 in eachrow(df2)
    for data in eachrow(df)
        println("H",data[1]," " ,data[2]," ")
        
        if data[1]==data1[1]
            datas[data[1]]=data[2]+data1[2]
            println("aaaaaaaaaa")
        else
            datas[data[1]]=5000
        end
    end
end

gives me:

julia> datas
Dict{Any, Any} with 5 entries:
  5 => 5000
  4 => 5000
  2 => 5000
  3 => 5000
  1 => 5000

So there are entries in the dictionary after executing the loop.

Please run any code exampled you are posting here in a new Julia session before posting to make sure they’re actually doing what you think they’re doing.

ayodyas · July 19, 2022, 5:09pm

It was a type error (typed correctly in the terminal)

for id =1:5
for data1 in eachrow(df2)
for data in eachrow(df)
if id==data[1]==data1[1]
datas[id]=data[2]+data1[2]

        else
            datas[id]=5000

        end
    end 
end

end

even if the id==data[1]==data1[1] in the first iteration (print is working), but for me every values in the Dict are 5000, it is supposed to be 110

Please correct me if I am wrong

is there any thing to be considered when appending to a blank Dict

Jeff_Emanuel · July 19, 2022, 5:31pm

You are overwriting datas[1] with 5000 on subsequent iterations.

ayodyas · July 19, 2022, 5:40pm

how to modify the code then?

Jeff_Emanuel · July 19, 2022, 6:05pm

  elseif !haskey(datas, id)
    datas[id]=5000

nilshg · July 19, 2022, 6:08pm

It is not clear from your code what you are actually trying to achieve, could you clarify what the expected output of your code is?

Your updated code does 5 x nrow(df) x nrow(df2) = 125 iterations, I have a nagging feeling that you are actually only trying to do 5 iterations, something like:

for (id, row1, row2) in zip(1:5, eachrow(df), eachrow(df2))
    ...
end

but it’s hard to tell from your question as it stands.

Jeff_Emanuel · July 19, 2022, 6:23pm

Maybe what you want is a list of the common elements in the first columns of the dataframes. You could construct Sets from the columns and get their intersection.

Jeff_Emanuel · July 19, 2022, 6:28pm

You can avoid the nested loops leading to O(n^3) performance by constructing Dicts with the first column values as keys and the second column values as Dict values. That will lead to O(1) lookups and O(n) overall performance.

ayodyas · July 19, 2022, 6:38pm

yes, my intention was to iterate each rows and append the values to dict. but there can be different number of rows in each dataframe may be possible

ayodyas · July 19, 2022, 8:49pm

df1=DataFrame(p1 = [1, 2, 3],p2 = [1,2,3])

df2=DataFrame(q1 = [1, 2, 3 ],q2 = [3,7,9])

df3=DataFrame(r1 = [0,1],r2 = [1,1])

datas=Dict()

for data1 in eachrow(df1)
    for data2 in eachrow(df2)
        for data3 in eachrow(df3)

            if data1[1]==data2[1]==data3[1]
                datas[data1[1]]="data1[2]"*"data1[2]"*"data3[2]"

            elseif data1[1]==data2[1]!=data3[1]
                datas[data1[1]]="data1[2]"*"data2[2]"

            elseif  data1[1]==data3[1]!=data2[1]
                datas[data1[1]]="data1[2]"*"data2[2]"

            elseif data2[1]!=data3[1]!=data1[1]
                datas[data2[1]]="data2[2]"*"data3[2]"

            elseif data1[1]!=data3[1]!=data2[1]
                datas[data1[1]]="data1[2]"
                datas[data2[1]]="data2[2]"
                datas[data3[1]]="data3[2]"
        
            end 
        end           
    end 
end

I am not sure whether zip can used for different length of data frames? (as it will iterate only until anyone ends)

Could someone suggest a better way to do this . There can be case that df3 can be zero.

As I am not sound enough to append the values to dictionary using !haskey.

It would be really helpful if some one guide with a solution

nilshg · July 20, 2022, 9:53am

zip will iterate until the shortest iterator is exhausted, so indeed won’t be helpful if one of them is length zero:

julia> for (i, j) ∈ zip(1:3, 1:2)
           println(i, "|", j)
       end
1|1
2|2

As said above, please provide the actual expected output of your code, it is not clear what you are trying to achieve.

ayodyas · July 20, 2022, 10:50am

my intention is to append the vlaue to the Dict based on the if , elseif condition as label

if p2==q2==r2, at some iterations, it will append the its respecive other columns with value of p2/q2/r2

output looks like
Dict{Any, Any} with 4 entries:
1 => " 1 1 0"
2 => " 2"
3 =>“3 3”
7 => “2”
9=> “3”

The basic issue is how to append the values into a Dict based on if else conditions inside 3 for loops

Thanks for the understanding

nilshg · July 20, 2022, 12:11pm

No need to thank me for my understanding, as I still haven’t understood what you’re after

In particular, I don’t see how your output is related to the code you’ve provided - in your code the left hand side of all your assignments is either datas[data1[1]] or datas[data2[1]] or datas[data3[1]], so in any case the first column of your DataFrames. You therefore can’t end up with the keys 7 or 9 in your Dict, as these numbers only appear in the second columns of df2.

In any case I now understand that you are trying to append something to your dictionary, whereas your current code overwrites the keys. Consider:

julia> d = Dict(1 => "a")
Dict{Int64, String} with 1 entry:
  1 => "a"

julia> d[1] = "b"
"b"

julia> d
Dict{Int64, String} with 1 entry:
  1 => "b"

If you want to append, you need to get the item first:

julia> d[1] = d[1] * "c"
"bc"

julia> d
Dict{Int64, String} with 1 entry:
  1 => "bc"

But what if the item doesn’t exist yet?

julia> d[2] = d[2] * "d"
ERROR: KeyError: key 2 not found

That’s what get is for, here you can supply a default value that will be returned if the key doesn’t exist. In your case it’s useful to return an empty string to append to:

julia> d[2] = get(d, 2, "") * "d"
"d"

julia> d
Dict{Int64, String} with 2 entries:
  2 => "d"
  1 => "bc"

So your code could look something like this:

using DataFrames
df1=DataFrame(p1 = [1, 2, 3], p2 = [1, 2, 3])
df2=DataFrame(q1 = [1, 2, 3], q2 = [3, 7, 9])
df3=DataFrame(r1 = [0, 1], r2 = [1, 1])
datas = Dict()

for data1 in eachrow(df1)
    for data2 in eachrow(df2)
        for data3 in eachrow(df3)

            v1, v2, v3 = data1[1], data2[1], data3[1]
            s1, s2, s3 = string.(data1[2], data2[2], data3[2])

            if v1 == v2 == v3
                datas[v1] = get(datas, v1, "") * s1 * s2 * s3
            elseif v1 == v2 != v3
                datas[v1] = get(datas, v1, "") * s1 * s2
            elseif  v1 == v3 != v2 
                datas[v1] = get(datas, v1, "") * s1 * s2
            elseif v2 != v3 != v1
                datas[v2] = get(datas, v2, "") * s2 * s3
            elseif v1 != v3 != v2 
                datas[v1] = get(datas, v1, "") * s1 
                datas[v2] = get(datas, v2, "") * s2
                datas[v3] = get(datas, v3, "") * s3
            end 
        end           
    end 
end

This gives a very different result from the one you’re looking for above, but this is mainly because I still don’t understand what you’re actually trying to compare against what.

ayodyas · July 20, 2022, 6:12pm

Thanks for the detailed explanations. This method seems to be working except the fact that, some strings are repeated

julia> datas
Dict{Any, Any} with 3 entries:
2 => “7127277171”
3 => “9191913939”
1 => “1313117193131”

Is there any way to prevent the repetitions in appending ?

nilshg · July 20, 2022, 10:08pm

You could either check whether a digit is already contained in the string, or call unique on the digits afterwards. The first option is probably more efficient than the second, but tbh the overall scheme seems pretty suboptimal so it might not matter much.

Topic		Replies	Views
How do I append/add data in dictionary with same key? New to Julia	5	8813	October 5, 2018
Dict from dataframe General Usage dictionary , dataframes	5	1311	July 15, 2022
DataFrame in Nested Loop New to Julia dataframes	8	1100	December 3, 2020
Add colum to dataframe with values based on various conditions General Usage dataframes , dataframesmeta	5	873	August 31, 2022
Build dataframe from Dict New to Julia dictionary , dataframes	5	8421	September 8, 2019

Iterating through rows in julia Dataframe

Related topics