Adding information from one DataFrame to another

I have a DataFrame that contains the columns “Date” and “Ticker”.

Suppose I have another DataFrame too that contains the columns “Date” and “ClosingPrice”.

How do I add the information from the second DataFrame to the first such that I end up with three columns, “Date”, “Ticker” and “ClosingPrice”?

sounds like u need leftjoin

1 Like

Will that work without perfect correspondence between dates in the two DataFrames?

generally no. you may want to google rolling joins and fuzzy joins and see if anything helps. haven’t used those in the julia ecosystem.

I should probably illustrate specifically what I mean. For example:

DataFrameA:
20-07-2021 IBM
24-07-2021 IBM

DataFrameB:
20-07-2021 IBM 100
21-07-2021 IBM 101
22-07-2021 IBM 102
23-07-2021 IBM 103
24-07-2021 IBM 104

Would left join work here?

try it.

also u haven’t shown ur desired result.

1 Like

Hi @Nash

Is something like this what you’re looking for?

julia> A = DataFrame(date = [Date("2021-07-20"),Date("2021-07-24")],
                     symbol = ["IBM","IBM"])
2×2 DataFrame
 Row │ date        symbol 
     │ Date        String 
─────┼────────────────────
   1 │ 2021-07-20  IBM
   2 │ 2021-07-24  IBM

julia> B = DataFrame(date = [Date("2021-07-20"),Date("2021-07-21"),
                             Date("2021-07-22"),Date("2021-07-23"),
                             Date("2021-07-24")],
                     price = [100,101,102,103,104])
5×2 DataFrame
 Row │ date        price 
     │ Date        Int64 
─────┼───────────────────
   1 │ 2021-07-20    100
   2 │ 2021-07-21    101
   3 │ 2021-07-22    102
   4 │ 2021-07-23    103
   5 │ 2021-07-24    104

julia> leftjoin(B,A, on = :date)
5×3 DataFrame
 Row │ date        price  symbol  
     │ Date        Int64  String? 
─────┼────────────────────────────
   1 │ 2021-07-20    100  IBM
   2 │ 2021-07-24    104  IBM
   3 │ 2021-07-21    101  missing 
   4 │ 2021-07-22    102  missing 
   5 │ 2021-07-23    103  missing 

For further details the documentation about joins with DataFrames is here: Joins · DataFrames.jl

2 Likes

This solution is correct for the example I gave. But suppose A contained not only information about the ticker IBM but also the ticker MSFT. In that case, joining on date will add the price of IBM to MSFT too.

How can I place a condition to rule out that mistake?

leftjoin(B, A[:, Not(:MSFT)], on = :date) assuming that MSFT is a column in A.

Generally you can use leftjoin(B, A, on = [:ticker, :date]), but it wouldn’t make much sense in this case, since you already have symbol column in B, so it’s hard to understand how result should look like.

Ah sorry I hadn’t scrolled up to see what the data actually looks like - I think your suggestion is correct (B has only IBM in the ticker column, so if A has MSFT and IBM in ticker then joining on both will only take the IBM entries from A)

Hi @Nash

To make sure we are giving you the right answer and make it easier for us to help you perhaps is best if you build a DataFrame A and B as in the example above and then write manually the results you would like to have after joining.

You can start loading the packages DataFrames and Dates like this:

using DataFrames, Dates

and then

A = DataFrame(date = [Date("2021-07-20"), 
                                     Date("2021-07-24"),  
                                     Date("2021-07-24")],
                     symbol = ["IBM","IBM","MSFT"])
B = ... etc.

Thanks! :slight_smile:

3 Likes