How to add new column to Dataframe

Shazman · April 7, 2021, 10:27pm

Hi
I want to add dataframe A as a first column of dataframe of B.
so the first column of new dataframe will be IID from A
how can I do that ? thank you in advance!

dataframe A

  Row │ IID       
      │ Int64     
──────┼───────────
    1 │ 408610274
    2 │ 409588054
    3 │ 409894206
    4 │ 410902281
    5 │ 411257293

dataframe B

 Row │ x1       x2       x3       x4       x5      
     │ Float64  Float64  Float64  Float64  Float64 
─────┼─────────────────────────────────────────────
   1 │     1.0      1.0      2.0      1.0      0.0
   2 │     2.0      2.0      2.0      2.0      1.0
   3 │     1.0      2.0      1.0      1.0      0.0
   4 │     2.0      0.0      2.0      2.0      1.0
   5 │     2.0      2.0      1.0      2.0      1.0

TheCedarPrince · April 7, 2021, 11:02pm

You could do hcat(A, B) if I am understanding you correctly.

using DataFrames
A = DataFrame([1:5], [:IID])
B = rand(5,5) .* 10 |> DataFrame

hcat(A, B)

5×6 DataFrame
 Row │ IID    x1       x2       x3       x4        x5
     │ Int64  Float64  Float64  Float64  Float64   Float64
─────┼───────────────────────────────────────────────────────
   1 │     1  9.28669  8.78076  1.19252  5.05751   1.59102
   2 │     2  3.0458   6.38638  2.4602   7.59882   2.78558
   3 │     3  4.7442   4.91209  9.38128  5.55845   0.0363324
   4 │     4  4.13458  4.19869  5.46127  2.87679   2.81345
   5 │     5  9.74654  8.61283  1.99002  0.444639  6.98284

Shazman · April 7, 2021, 11:32pm

thank you,

I have another problem,
my dataframe look like this,
the columns names x1,x2…x8 are gene’s names.
and 0,1,2, stands for genotypes.
and the column IID is for individuals names and they grouped in number 10004,
now I want to have avarage genotype for group 10004,
how can I do that?

 Row │ Herd.when.genotyped  IID        x1        x2        x3        x4        x5        x6        x7        x8       
     │ Int64                Int64      Float64?  Float64?  Float64?  Float64?  Float64?  Float64?  Float64?  Float64? 
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │               10004  410240170       2.0       1.0       2.0       2.0       1.0       2.0       1.0       1.0
   2 │               10004  412396339       2.0       0.0       0.0       2.0       2.0       1.0       1.0       0.0
   3 │               10004  412442014       2.0       2.0       1.0       2.0       1.0       1.0       2.0       0.0
   4 │               10004  412409256       2.0       1.0       0.0       2.0       2.0       0.0       2.0       0.0
   5 │               10004  412419664       2.0       2.0       0.0       2.0       1.0       1.0       2.0       1.0
   6 │               10004  412442177       1.0       2.0       2.0       1.0       1.0       2.0       2.0       2.0
   7 │               10004  412556442       2.0       1.0       2.0       2.0       2.0       2.0       2.0       1.0
   8 │               10004  412502732       2.0       1.0       0.0       2.0       2.0       1.0       2.0       0.0
   9 │               10004  412490450       2.0       1.0       1.0       2.0       2.0       2.0       2.0       2.0
  10 │               10004  412788155       2.0       1.0       2.0       2.0       1.0       2.0       2.0       1.0

TheCedarPrince · April 7, 2021, 11:40pm

Hi @Shazman no problem. As your new line of questioning is not related to your topic, I suggest you open up a new Discourse question with additional questions you have.

I will answer your latest question here however. I am not sure I understand you entirely so you may need to rephrase your question but I would do this (using the dataframes I posted from my last comment):

using Statistics
C = hcat(A, B)
C.average = [mean(row[2:6]) for row in eachrow(C)]

 Row │ IID    x1       x2       x3       x4        x5         average
     │ Int64  Float64  Float64  Float64  Float64   Float64    Float64
─────┼────────────────────────────────────────────────────────────────
   1 │     1  9.28669  8.78076  1.19252  5.05751   1.59102    5.1817
   2 │     2  3.0458   6.38638  2.4602   7.59882   2.78558    4.45535
   3 │     3  4.7442   4.91209  9.38128  5.55845   0.0363324  4.92647
   4 │     4  4.13458  4.19869  5.46127  2.87679   2.81345    3.89696
   5 │     5  9.74654  8.61283  1.99002  0.444639  6.98284    5.55537

pdeffebach · April 8, 2021, 12:12am

You can insert a new column into the first position with insertcols.

insertcols!(B, 1, :a => A.IID)

You can also use a leftjoin (or other join types) if the orders don’t match exactly.

For the collapse-ing operation

@chain df begin 
    groupby("Herd.when.genotyped")
    combine(:x1 => mean)
end

To get the mean of all the columns, do

@chain df begin 
    groupby("Herd.when.genotyped")
    combine(names(df, Between(:x1, :x5)) .=> mean)
end

Shazman · April 8, 2021, 12:31am

thank you !
it works perfectly.

jonjilla · April 8, 2021, 3:59am

If we aren’t concerned with the position, that the column is inserted, could we simply do this?

B.a = A.IID

Actually, I’m not clear on the difference between this and

B[:,:a] = A.IID

Are they effectively the same?

pdeffebach · April 8, 2021, 2:12pm

They are effectively the same. The columns :a and :IID will be the exact same in memory.

The first one does not copy anywhere. The second one creates a copy so the the two columns are stored in different places.

jonjilla · April 8, 2021, 11:31pm

Thanks for the info. What is the implication that one makes a copy and the other doesn’t?

I’ve noticed that in some cases, I cannot use the first form, and only the second form works, but I haven’t quite figure out when that is, so I’ve started to use the second form only, which seems to work, but I figure there must be a reason why those two forms exist.

When you should use one vs the other?

Topic		Replies	Views
Add a column in dataframe based on another column Data question	4	377	March 1, 2021
Combine(Merge) Columns New to Julia dataframes	2	2553	February 28, 2021
DataFrames: hcat General Usage dataframes	3	2239	July 30, 2020
Creating new dataframe column! General Usage question , dataframes	3	295	April 21, 2021
How to properly add two new columns (in dataframe) from a function that returns two arrays? General Usage dataframes , function	7	820	July 3, 2021

How to add new column to Dataframe

Related topics