How to add new column to Dataframe

Hi
I want to add dataframe A as a first column of dataframe of B.
so the first column of new dataframe will be IID from A
how can I do that ? thank you in advance!

dataframe A

  Row │ IID       
      │ Int64     
──────┼───────────
    1 │ 408610274
    2 │ 409588054
    3 │ 409894206
    4 │ 410902281
    5 │ 411257293

dataframe B

 Row │ x1       x2       x3       x4       x5      
     │ Float64  Float64  Float64  Float64  Float64 
─────┼─────────────────────────────────────────────
   1 │     1.0      1.0      2.0      1.0      0.0
   2 │     2.0      2.0      2.0      2.0      1.0
   3 │     1.0      2.0      1.0      1.0      0.0
   4 │     2.0      0.0      2.0      2.0      1.0
   5 │     2.0      2.0      1.0      2.0      1.0

You could do hcat(A, B) if I am understanding you correctly.

using DataFrames
A = DataFrame([1:5], [:IID])
B = rand(5,5) .* 10 |> DataFrame

hcat(A, B)

5×6 DataFrame
 Row │ IID    x1       x2       x3       x4        x5
     │ Int64  Float64  Float64  Float64  Float64   Float64
─────┼───────────────────────────────────────────────────────
   1 │     1  9.28669  8.78076  1.19252  5.05751   1.59102
   2 │     2  3.0458   6.38638  2.4602   7.59882   2.78558
   3 │     3  4.7442   4.91209  9.38128  5.55845   0.0363324
   4 │     4  4.13458  4.19869  5.46127  2.87679   2.81345
   5 │     5  9.74654  8.61283  1.99002  0.444639  6.98284
1 Like

thank you,

I have another problem,
my dataframe look like this,
the columns names x1,x2x8 are gene’s names.
and 0,1,2, stands for genotypes.
and the column IID is for individuals names and they grouped in number 10004,
now I want to have avarage genotype for group 10004,
how can I do that?

 Row │ Herd.when.genotyped  IID        x1        x2        x3        x4        x5        x6        x7        x8       
     │ Int64                Int64      Float64?  Float64?  Float64?  Float64?  Float64?  Float64?  Float64?  Float64? 
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │               10004  410240170       2.0       1.0       2.0       2.0       1.0       2.0       1.0       1.0
   2 │               10004  412396339       2.0       0.0       0.0       2.0       2.0       1.0       1.0       0.0
   3 │               10004  412442014       2.0       2.0       1.0       2.0       1.0       1.0       2.0       0.0
   4 │               10004  412409256       2.0       1.0       0.0       2.0       2.0       0.0       2.0       0.0
   5 │               10004  412419664       2.0       2.0       0.0       2.0       1.0       1.0       2.0       1.0
   6 │               10004  412442177       1.0       2.0       2.0       1.0       1.0       2.0       2.0       2.0
   7 │               10004  412556442       2.0       1.0       2.0       2.0       2.0       2.0       2.0       1.0
   8 │               10004  412502732       2.0       1.0       0.0       2.0       2.0       1.0       2.0       0.0
   9 │               10004  412490450       2.0       1.0       1.0       2.0       2.0       2.0       2.0       2.0
  10 │               10004  412788155       2.0       1.0       2.0       2.0       1.0       2.0       2.0       1.0

Hi @Shazman no problem. As your new line of questioning is not related to your topic, I suggest you open up a new Discourse question with additional questions you have.

I will answer your latest question here however. I am not sure I understand you entirely so you may need to rephrase your question but I would do this (using the dataframes I posted from my last comment):

using Statistics
C = hcat(A, B)
C.average = [mean(row[2:6]) for row in eachrow(C)]

 Row │ IID    x1       x2       x3       x4        x5         average
     │ Int64  Float64  Float64  Float64  Float64   Float64    Float64
─────┼────────────────────────────────────────────────────────────────
   1 │     1  9.28669  8.78076  1.19252  5.05751   1.59102    5.1817
   2 │     2  3.0458   6.38638  2.4602   7.59882   2.78558    4.45535
   3 │     3  4.7442   4.91209  9.38128  5.55845   0.0363324  4.92647
   4 │     4  4.13458  4.19869  5.46127  2.87679   2.81345    3.89696
   5 │     5  9.74654  8.61283  1.99002  0.444639  6.98284    5.55537
1 Like

You can insert a new column into the first position with insertcols.

insertcols!(B, 1, :a => A.IID)

You can also use a leftjoin (or other join types) if the orders don’t match exactly.

For the collapse-ing operation

@chain df begin 
    groupby("Herd.when.genotyped")
    combine(:x1 => mean)
end

To get the mean of all the columns, do

@chain df begin 
    groupby("Herd.when.genotyped")
    combine(names(df, Between(:x1, :x5)) .=> mean)
end
3 Likes

thank you !
it works perfectly. :star_struck:

If we aren’t concerned with the position, that the column is inserted, could we simply do this?

B.a = A.IID

Actually, I’m not clear on the difference between this and

B[:,:a] = A.IID

Are they effectively the same?

They are effectively the same. The columns :a and :IID will be the exact same in memory.

The first one does not copy anywhere. The second one creates a copy so the the two columns are stored in different places.

Thanks for the info. What is the implication that one makes a copy and the other doesn’t?

I’ve noticed that in some cases, I cannot use the first form, and only the second form works, but I haven’t quite figure out when that is, so I’ve started to use the second form only, which seems to work, but I figure there must be a reason why those two forms exist.

When you should use one vs the other?