Hi
I want to add dataframe A as a first column of dataframe of B.
so the first column of new dataframe will be IID from A
how can I do that ? thank you in advance!
dataframe A
Row │ IID
│ Int64
──────┼───────────
1 │ 408610274
2 │ 409588054
3 │ 409894206
4 │ 410902281
5 │ 411257293
dataframe B
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼─────────────────────────────────────────────
1 │ 1.0 1.0 2.0 1.0 0.0
2 │ 2.0 2.0 2.0 2.0 1.0
3 │ 1.0 2.0 1.0 1.0 0.0
4 │ 2.0 0.0 2.0 2.0 1.0
5 │ 2.0 2.0 1.0 2.0 1.0
You could do hcat(A, B) if I am understanding you correctly.
using DataFrames
A = DataFrame([1:5], [:IID])
B = rand(5,5) .* 10 |> DataFrame
hcat(A, B)
5×6 DataFrame
Row │ IID x1 x2 x3 x4 x5
│ Int64 Float64 Float64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────────────
1 │ 1 9.28669 8.78076 1.19252 5.05751 1.59102
2 │ 2 3.0458 6.38638 2.4602 7.59882 2.78558
3 │ 3 4.7442 4.91209 9.38128 5.55845 0.0363324
4 │ 4 4.13458 4.19869 5.46127 2.87679 2.81345
5 │ 5 9.74654 8.61283 1.99002 0.444639 6.98284
2 Likes
thank you,
I have another problem,
my dataframe look like this,
the columns names x1,x2…x8 are gene’s names.
and 0,1,2, stands for genotypes.
and the column IID is for individuals names and they grouped in number 10004,
now I want to have avarage genotype for group 10004,
how can I do that?
Row │ Herd.when.genotyped IID x1 x2 x3 x4 x5 x6 x7 x8
│ Int64 Int64 Float64? Float64? Float64? Float64? Float64? Float64? Float64? Float64?
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 10004 410240170 2.0 1.0 2.0 2.0 1.0 2.0 1.0 1.0
2 │ 10004 412396339 2.0 0.0 0.0 2.0 2.0 1.0 1.0 0.0
3 │ 10004 412442014 2.0 2.0 1.0 2.0 1.0 1.0 2.0 0.0
4 │ 10004 412409256 2.0 1.0 0.0 2.0 2.0 0.0 2.0 0.0
5 │ 10004 412419664 2.0 2.0 0.0 2.0 1.0 1.0 2.0 1.0
6 │ 10004 412442177 1.0 2.0 2.0 1.0 1.0 2.0 2.0 2.0
7 │ 10004 412556442 2.0 1.0 2.0 2.0 2.0 2.0 2.0 1.0
8 │ 10004 412502732 2.0 1.0 0.0 2.0 2.0 1.0 2.0 0.0
9 │ 10004 412490450 2.0 1.0 1.0 2.0 2.0 2.0 2.0 2.0
10 │ 10004 412788155 2.0 1.0 2.0 2.0 1.0 2.0 2.0 1.0
Hi @Shazman no problem. As your new line of questioning is not related to your topic, I suggest you open up a new Discourse question with additional questions you have.
I will answer your latest question here however. I am not sure I understand you entirely so you may need to rephrase your question but I would do this (using the dataframes I posted from my last comment):
using Statistics
C = hcat(A, B)
C.average = [mean(row[2:6]) for row in eachrow(C)]
Row │ IID x1 x2 x3 x4 x5 average
│ Int64 Float64 Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────────────────
1 │ 1 9.28669 8.78076 1.19252 5.05751 1.59102 5.1817
2 │ 2 3.0458 6.38638 2.4602 7.59882 2.78558 4.45535
3 │ 3 4.7442 4.91209 9.38128 5.55845 0.0363324 4.92647
4 │ 4 4.13458 4.19869 5.46127 2.87679 2.81345 3.89696
5 │ 5 9.74654 8.61283 1.99002 0.444639 6.98284 5.55537
1 Like
You can insert a new column into the first position with insertcols.
insertcols!(B, 1, :a => A.IID)
You can also use a leftjoin (or other join types) if the orders don’t match exactly.
For the collapse-ing operation
@chain df begin
groupby("Herd.when.genotyped")
combine(:x1 => mean)
end
To get the mean of all the columns, do
@chain df begin
groupby("Herd.when.genotyped")
combine(names(df, Between(:x1, :x5)) .=> mean)
end
4 Likes
thank you !
it works perfectly. 
If we aren’t concerned with the position, that the column is inserted, could we simply do this?
B.a = A.IID
Actually, I’m not clear on the difference between this and
B[:,:a] = A.IID
Are they effectively the same?
They are effectively the same. The columns :a and :IID will be the exact same in memory.
The first one does not copy anywhere. The second one creates a copy so the the two columns are stored in different places.
Thanks for the info. What is the implication that one makes a copy and the other doesn’t?
I’ve noticed that in some cases, I cannot use the first form, and only the second form works, but I haven’t quite figure out when that is, so I’ve started to use the second form only, which seems to work, but I figure there must be a reason why those two forms exist.
When you should use one vs the other?