# Indexing in dataframes

Hi, This should be straightforward dataframes question, but I have not been able to get my head around it.

I have two dataframes: 1. Describes the relationship between different quantities based on ID and 2. Contains some numerical quantities for all sets of IDs.

Using the first dataframe, I need to iterate through each row and find the corresponding index (row number) in the second dataframe (named Master in the code), to use for a mathematical operation. This is better explained in a MWE

``````using DataFrames

function generate_data()
Relationship = DataFrame(ID = ["A1","A2","A4"], LINK_1 = ["A5","A6","A10"],
ID_t =  collect(1:15)
ID = Vector{String}(undef,15)

for i in 1:15
ID[i]="A".*string(ID_t[i])
end

PRICE = rand(10:100,15)
Master = DataFrame(ID = ID,PRICE = PRICE)
return Relationship, Master
end

Relationship, Master = generate_data()
x = Master.PRICE
y = Vector{Any}(undef,length(Relationship.ID))
# Use a loop that iterates through each row of the Relationship dataframe,
# finds index (row number) of the corresponding ID(s) in columns (LINK_1, etc.) in the Master dataframe
# and performs some mathematical operation based on it and store the results in a vector y.
# Example of mathematical operation
# y = x+x+x+x
# y = x+x + x+x
# y = x + x+x
``````

1. How can I find the corresponding indices in the `Master` dataframe based on the criteria of relationships set out in the `Relationship` dataframe?
2. Ideally I would like to use a loop to iterate and then apply some mathematical operation such as that shown in comments in the code.

Why not use a `join`?

If I use `join(Relationship,Master, kind =:outer, on = intersect(names(Relationsip), names(Master)))` This creates a dataframe combining columns from both dataframes. I am struggling to understand how to find indices of the IDs listed in columns LINK_1 to LINK_3 in the `Master` dataframe. For example in the first row A1 is linked to A5, A3 and A9. Indices for them are 5, 3 and 9.
Basically I want to do the type of calculation written in comments in the code. Obviously not manually, but using a loop or something where I read the relevant indices and store the results of the calculation in an array `y`.

I don’t mind how to get to that step, whether using `join` or something else. My knowledge of dataframes is not great, hence the question!

A DataFrame probably isn’t the right object for `Master`. Better to use a Dict instead

``````julia> d = Dict([Master.ID[i] => Master.PRICE[i] for i in 1:nrow(Master)]);

sum(get.(Ref(d), row_strs, 0))
end;

julia> map(eachrow(Relationship)) do r
end
3-element Array{Int64,1}:
216
220
78
``````

Thank you!

Can we do the same operation but using a loop and without the functions `add_all` and `map`? The reason being it will give me flexibility on the type of operation I want to do, such as multiplication, subtraction, addition, etc, whereas the `add_all` function is specialised for summation.

For example, a loop like this

``````for i in eachrow(Relationship)
y[i] = sum(x[i] for i in ....)
end
``````

Note: The loop is not written correctly as I don’t know in this case how to use it.

It sounds like you could benefit from reading the manual a bit more. In `add_all`, you could just keep `get.(Ref(d), row_strs, 0)` and then work with that array in however way you see fit.

1 Like