Merging two dataframes of different length

I am a beginner with Julia and am struggling to merge two dataframes I have of various length. This is one of the dataframes, then I have another that is 5120 long, in interest of space I have only included the first 20, but the second dataframe contains all the years the first dataframe has. I want to add G as a column to my second dataframe so that, every row with year=1965 has G=1.08906 and so on for each year. Any suggestions how I might do this - I apologize if there is an easy fix.

Row β”‚ year G
β”‚ Float64 Float64
─────┼──────────────────
1 β”‚ 1965.0 1.08906
2 β”‚ 1966.0 1.10632
3 β”‚ 1967.0 1.09697
4 β”‚ 1968.0 1.09504
5 β”‚ 1970.0 1.12902
6 β”‚ 1971.0 1.13436
7 β”‚ 1972.0 1.14328
8 β”‚ 1973.0 1.14644
9 β”‚ 1975.0 1.14837
10 β”‚ 1976.0 1.14905
11 β”‚ 1977.0 1.15022
12 β”‚ 1978.0 1.15693
13 β”‚ 1979.0 1.16652
14 β”‚ 1980.0 1.17443
15 β”‚ 1981.0 1.1808
16 β”‚ 1983.0 1.17581
17 β”‚ 1984.0 1.18052

5120Γ—3 DataFrame
Row β”‚ year id s
β”‚ Int64 Int64 Float64
──────┼────────────────────────────
1 β”‚ 1965 100751 0.000706819
2 β”‚ 1965 104151 0.000943925
3 β”‚ 1965 104179 0.000898235
4 β”‚ 1965 105330 0.000309796
5 β”‚ 1965 110097 6.78435e-5
6 β”‚ 1965 110404 4.6729e-5
7 β”‚ 1965 110635 0.00147144
8 β”‚ 1965 110644 0.000311526
9 β”‚ 1965 110653 3.46141e-6
10 β”‚ 1965 110662 0.00114019
11 β”‚ 1965 110671 0.000150917
12 β”‚ 1965 110699 5.57286e-5
13 β”‚ 1965 110705 0.000406715
14 β”‚ 1965 117636 4.77674e-5
15 β”‚ 1965 122612 0.000192108
16 β”‚ 1965 123961 0.00063586
17 β”‚ 1965 124885 7.85739e-5
18 β”‚ 1965 126775 6.47283e-5
19 β”‚ 1965 126818 0.000563171
20 β”‚ 1965 127060 0.000426791
21 β”‚ 1965 127741 0.000514019

Hi @Will_G, welcome! No need to apologize at all, I think it’s a great question that I would also like to know the answer to. Not sure if this is the most idiomatic way to go about this, but this sounds like a good job for leftjoin:

using DataFrames

julia> df1 = DataFrame(
           year = 1965:1970,
           G = rand(6),
       )
6Γ—2 DataFrame
 Row β”‚ year   G        
     β”‚ Int64  Float64  
─────┼─────────────────
   1 β”‚  1965  0.181731
   2 β”‚  1966  0.762532
   3 β”‚  1967  0.415284
   4 β”‚  1968  0.216933
   5 β”‚  1969  0.374068
   6 β”‚  1970  0.424626

julia> df2 = DataFrame(
           year = repeat(1965:1970, inner=2),
           id = rand(10_000:13_000, 12),
           s = rand(12),
       )
12Γ—3 DataFrame
 Row β”‚ year   id     s        
     β”‚ Int64  Int64  Float64  
─────┼────────────────────────
   1 β”‚  1965  11862  0.929915
   2 β”‚  1965  11758  0.466497
   3 β”‚  1966  11479  0.542719
   4 β”‚  1966  11924  0.978926
   5 β”‚  1967  12506  0.774413
   6 β”‚  1967  10473  0.153476
   7 β”‚  1968  12482  0.41511
   8 β”‚  1968  12222  0.303588
   9 β”‚  1969  12220  0.636519
  10 β”‚  1969  10899  0.408869
  11 β”‚  1970  11823  0.633692
  12 β”‚  1970  11411  0.92418

julia> leftjoin(df2, df1; on=:year)
12Γ—4 DataFrame
 Row β”‚ year   id     s         G        
     β”‚ Int64  Int64  Float64   Float64? 
─────┼──────────────────────────────────
   1 β”‚  1965  11862  0.929915  0.181731
   2 β”‚  1965  11758  0.466497  0.181731
   3 β”‚  1966  11479  0.542719  0.762532
   4 β”‚  1966  11924  0.978926  0.762532
   5 β”‚  1967  12506  0.774413  0.415284
   6 β”‚  1967  10473  0.153476  0.415284
   7 β”‚  1968  12482  0.41511   0.216933
   8 β”‚  1968  12222  0.303588  0.216933
   9 β”‚  1969  12220  0.636519  0.374068
  10 β”‚  1969  10899  0.408869  0.374068
  11 β”‚  1970  11823  0.633692  0.424626
  12 β”‚  1970  11411  0.92418   0.424626

And then if you wanted to edit df2 in place, you could just replace leftjoin with leftjoin!

2 Likes

@icweaver Thank you that worked perfectly!

1 Like