Hi
I want to add dataframe A as a first column of dataframe of B.
so the first column of new dataframe will be IID
from A
how can I do that ? thank you in advance!
dataframe A
Row │ IID
│ Int64
──────┼───────────
1 │ 408610274
2 │ 409588054
3 │ 409894206
4 │ 410902281
5 │ 411257293
dataframe B
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼─────────────────────────────────────────────
1 │ 1.0 1.0 2.0 1.0 0.0
2 │ 2.0 2.0 2.0 2.0 1.0
3 │ 1.0 2.0 1.0 1.0 0.0
4 │ 2.0 0.0 2.0 2.0 1.0
5 │ 2.0 2.0 1.0 2.0 1.0
You could do hcat(A, B)
if I am understanding you correctly.
using DataFrames
A = DataFrame([1:5], [:IID])
B = rand(5,5) .* 10 |> DataFrame
hcat(A, B)
5×6 DataFrame
Row │ IID x1 x2 x3 x4 x5
│ Int64 Float64 Float64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────────────
1 │ 1 9.28669 8.78076 1.19252 5.05751 1.59102
2 │ 2 3.0458 6.38638 2.4602 7.59882 2.78558
3 │ 3 4.7442 4.91209 9.38128 5.55845 0.0363324
4 │ 4 4.13458 4.19869 5.46127 2.87679 2.81345
5 │ 5 9.74654 8.61283 1.99002 0.444639 6.98284
2 Likes
thank you,
I have another problem,
my dataframe look like this,
the columns names x1
,x2
…x8
are gene’s names.
and 0
,1
,2
, stands for genotypes.
and the column IID
is for individuals names and they grouped in number 10004
,
now I want to have avarage genotype for group 10004
,
how can I do that?
Row │ Herd.when.genotyped IID x1 x2 x3 x4 x5 x6 x7 x8
│ Int64 Int64 Float64? Float64? Float64? Float64? Float64? Float64? Float64? Float64?
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 10004 410240170 2.0 1.0 2.0 2.0 1.0 2.0 1.0 1.0
2 │ 10004 412396339 2.0 0.0 0.0 2.0 2.0 1.0 1.0 0.0
3 │ 10004 412442014 2.0 2.0 1.0 2.0 1.0 1.0 2.0 0.0
4 │ 10004 412409256 2.0 1.0 0.0 2.0 2.0 0.0 2.0 0.0
5 │ 10004 412419664 2.0 2.0 0.0 2.0 1.0 1.0 2.0 1.0
6 │ 10004 412442177 1.0 2.0 2.0 1.0 1.0 2.0 2.0 2.0
7 │ 10004 412556442 2.0 1.0 2.0 2.0 2.0 2.0 2.0 1.0
8 │ 10004 412502732 2.0 1.0 0.0 2.0 2.0 1.0 2.0 0.0
9 │ 10004 412490450 2.0 1.0 1.0 2.0 2.0 2.0 2.0 2.0
10 │ 10004 412788155 2.0 1.0 2.0 2.0 1.0 2.0 2.0 1.0
Hi @Shazman no problem. As your new line of questioning is not related to your topic, I suggest you open up a new Discourse question with additional questions you have.
I will answer your latest question here however. I am not sure I understand you entirely so you may need to rephrase your question but I would do this (using the dataframes I posted from my last comment):
using Statistics
C = hcat(A, B)
C.average = [mean(row[2:6]) for row in eachrow(C)]
Row │ IID x1 x2 x3 x4 x5 average
│ Int64 Float64 Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────────────────
1 │ 1 9.28669 8.78076 1.19252 5.05751 1.59102 5.1817
2 │ 2 3.0458 6.38638 2.4602 7.59882 2.78558 4.45535
3 │ 3 4.7442 4.91209 9.38128 5.55845 0.0363324 4.92647
4 │ 4 4.13458 4.19869 5.46127 2.87679 2.81345 3.89696
5 │ 5 9.74654 8.61283 1.99002 0.444639 6.98284 5.55537
1 Like
You can insert a new column into the first position with insertcols
.
insertcols!(B, 1, :a => A.IID)
You can also use a leftjoin
(or other join
types) if the orders don’t match exactly.
For the collapse
-ing operation
@chain df begin
groupby("Herd.when.genotyped")
combine(:x1 => mean)
end
To get the mean of all the columns, do
@chain df begin
groupby("Herd.when.genotyped")
combine(names(df, Between(:x1, :x5)) .=> mean)
end
4 Likes
thank you !
it works perfectly.
If we aren’t concerned with the position, that the column is inserted, could we simply do this?
B.a = A.IID
Actually, I’m not clear on the difference between this and
B[:,:a] = A.IID
Are they effectively the same?
They are effectively the same. The columns :a
and :IID
will be the exact same in memory.
The first one does not copy anywhere. The second one creates a copy so the the two columns are stored in different places.
Thanks for the info. What is the implication that one makes a copy and the other doesn’t?
I’ve noticed that in some cases, I cannot use the first form, and only the second form works, but I haven’t quite figure out when that is, so I’ve started to use the second form only, which seems to work, but I figure there must be a reason why those two forms exist.
When you should use one vs the other?