Iterating through rows in julia Dataframe

df=DataFrame(p1 = [1, 2, 3, 4, 5],p1 = [10,20,30,40,50])
df2=DataFrame(q1 = [1, 12, 13,14, 15],q2 = [100,200,300,400,500])
datas=Dict()

for data1 in eachrow(df2)
for data in eachrow(df)

    print("H",data[1]," " ,data[2]," ")
    if data[1]==data1[1]
          datas[data[1]]=data[2]+data1[2]

           print("aaaaaaaaaa")

    else
    datas[data[1]]=5000

    end

end
end

eventhough print us working but its not appending to the dict

Is there something did I wrong

Your code doesn’t run because you have two columns named p1 in your first DataFrame. Fixing this and running this code:

using DataFrames
df=DataFrame(p1 = [1, 2, 3, 4, 5], p2 = [10,20,30,40,50])
df2=DataFrame(q1 = [1, 12, 13,14, 15], q2 = [100,200,300,400,500])
datas=Dict()

for data1 in eachrow(df2)
    for data in eachrow(df)
        println("H",data[1]," " ,data[2]," ")
        
        if data[1]==data1[1]
            datas[data[1]]=data[2]+data1[2]
            println("aaaaaaaaaa")
        else
            datas[data[1]]=5000
        end
    end
end

gives me:

julia> datas
Dict{Any, Any} with 5 entries:
  5 => 5000
  4 => 5000
  2 => 5000
  3 => 5000
  1 => 5000

So there are entries in the dictionary after executing the loop.

Please run any code exampled you are posting here in a new Julia session before posting to make sure they’re actually doing what you think they’re doing.

It was a type error (typed correctly in the terminal)

for id =1:5
for data1 in eachrow(df2)
for data in eachrow(df)
if id==data[1]==data1[1]
datas[id]=data[2]+data1[2]

        else
            datas[id]=5000

        end
    end 
end

end

even if the id==data[1]==data1[1] in the first iteration (print is working), but for me every values in the Dict are 5000, it is supposed to be 110

Please correct me if I am wrong

is there any thing to be considered when appending to a blank Dict

You are overwriting datas[1] with 5000 on subsequent iterations.

how to modify the code then?

  elseif !haskey(datas, id)
    datas[id]=5000

It is not clear from your code what you are actually trying to achieve, could you clarify what the expected output of your code is?

Your updated code does 5 x nrow(df) x nrow(df2) = 125 iterations, I have a nagging feeling that you are actually only trying to do 5 iterations, something like:

for (id, row1, row2) in zip(1:5, eachrow(df), eachrow(df2))
    ...
end

but it’s hard to tell from your question as it stands.

1 Like

Maybe what you want is a list of the common elements in the first columns of the dataframes. You could construct Sets from the columns and get their intersection.

You can avoid the nested loops leading to O(n^3) performance by constructing Dicts with the first column values as keys and the second column values as Dict values. That will lead to O(1) lookups and O(n) overall performance.

yes, my intention was to iterate each rows and append the values to dict. but there can be different number of rows in each dataframe may be possible

df1=DataFrame(p1 = [1, 2, 3],p2 = [1,2,3])

df2=DataFrame(q1 = [1, 2, 3 ],q2 = [3,7,9])

df3=DataFrame(r1 = [0,1],r2 = [1,1])

datas=Dict()

for data1 in eachrow(df1)
    for data2 in eachrow(df2)
        for data3 in eachrow(df3)

            if data1[1]==data2[1]==data3[1]
                datas[data1[1]]="data1[2]"*"data1[2]"*"data3[2]"

            elseif data1[1]==data2[1]!=data3[1]
                datas[data1[1]]="data1[2]"*"data2[2]"

            elseif  data1[1]==data3[1]!=data2[1]
                datas[data1[1]]="data1[2]"*"data2[2]"

            elseif data2[1]!=data3[1]!=data1[1]
                datas[data2[1]]="data2[2]"*"data3[2]"

            elseif data1[1]!=data3[1]!=data2[1]
                datas[data1[1]]="data1[2]"
                datas[data2[1]]="data2[2]"
                datas[data3[1]]="data3[2]"
        
            end 
        end           
    end 
end

I am not sure whether zip can used for different length of data frames? (as it will iterate only until anyone ends)

Could someone suggest a better way to do this . There can be case that df3 can be zero.

As I am not sound enough to append the values to dictionary using !haskey.

It would be really helpful if some one guide with a solution

zip will iterate until the shortest iterator is exhausted, so indeed won’t be helpful if one of them is length zero:

julia> for (i, j) ∈ zip(1:3, 1:2)
           println(i, "|", j)
       end
1|1
2|2

As said above, please provide the actual expected output of your code, it is not clear what you are trying to achieve.

my intention is to append the vlaue to the Dict based on the if , elseif condition as label

if p2==q2==r2, at some iterations, it will append the its respecive other columns with value of p2/q2/r2

output looks like
Dict{Any, Any} with 4 entries:
1 => " 1 1 0"
2 => " 2"
3 =>“3 3”
7 => “2”
9=> “3”

The basic issue is how to append the values into a Dict based on if else conditions inside 3 for loops

Thanks for the understanding

No need to thank me for my understanding, as I still haven’t understood what you’re after :slight_smile:

In particular, I don’t see how your output is related to the code you’ve provided - in your code the left hand side of all your assignments is either datas[data1[1]] or datas[data2[1]] or datas[data3[1]], so in any case the first column of your DataFrames. You therefore can’t end up with the keys 7 or 9 in your Dict, as these numbers only appear in the second columns of df2.

In any case I now understand that you are trying to append something to your dictionary, whereas your current code overwrites the keys. Consider:

julia> d = Dict(1 => "a")
Dict{Int64, String} with 1 entry:
  1 => "a"

julia> d[1] = "b"
"b"

julia> d
Dict{Int64, String} with 1 entry:
  1 => "b"

If you want to append, you need to get the item first:

julia> d[1] = d[1] * "c"
"bc"

julia> d
Dict{Int64, String} with 1 entry:
  1 => "bc"

But what if the item doesn’t exist yet?

julia> d[2] = d[2] * "d"
ERROR: KeyError: key 2 not found

That’s what get is for, here you can supply a default value that will be returned if the key doesn’t exist. In your case it’s useful to return an empty string to append to:

julia> d[2] = get(d, 2, "") * "d"
"d"

julia> d
Dict{Int64, String} with 2 entries:
  2 => "d"
  1 => "bc"

So your code could look something like this:

using DataFrames
df1=DataFrame(p1 = [1, 2, 3], p2 = [1, 2, 3])
df2=DataFrame(q1 = [1, 2, 3], q2 = [3, 7, 9])
df3=DataFrame(r1 = [0, 1], r2 = [1, 1])
datas = Dict()

for data1 in eachrow(df1)
    for data2 in eachrow(df2)
        for data3 in eachrow(df3)

            v1, v2, v3 = data1[1], data2[1], data3[1]
            s1, s2, s3 = string.(data1[2], data2[2], data3[2])

            if v1 == v2 == v3
                datas[v1] = get(datas, v1, "") * s1 * s2 * s3
            elseif v1 == v2 != v3
                datas[v1] = get(datas, v1, "") * s1 * s2
            elseif  v1 == v3 != v2 
                datas[v1] = get(datas, v1, "") * s1 * s2
            elseif v2 != v3 != v1
                datas[v2] = get(datas, v2, "") * s2 * s3
            elseif v1 != v3 != v2 
                datas[v1] = get(datas, v1, "") * s1 
                datas[v2] = get(datas, v2, "") * s2
                datas[v3] = get(datas, v3, "") * s3
            end 
        end           
    end 
end

This gives a very different result from the one you’re looking for above, but this is mainly because I still don’t understand what you’re actually trying to compare against what.

Thanks for the detailed explanations. This method seems to be working except the fact that, some strings are repeated

julia> datas
Dict{Any, Any} with 3 entries:
2 => “7127277171”
3 => “9191913939”
1 => “1313117193131”

Is there any way to prevent the repetitions in appending ?

You could either check whether a digit is already contained in the string, or call unique on the digits afterwards. The first option is probably more efficient than the second, but tbh the overall scheme seems pretty suboptimal so it might not matter much.

1 Like