DataFrame query

hi

I have a dataframe as below

   Row │ Date        Ticker  Close     frontContract  secondContract 
       │ Date        String  Float64   String         String         
───────┼─────────────────────────────────────────────────────────────
     **1 │ 2019-01-31  E1F19   97.5275   E1F19          E1G19**
     2 │ 2019-01-30  E1F19   97.53     E1F19          E1G19
     3 │ 2019-01-29  E1F19   97.53     E1F19          E1G19
     4 │ 2019-01-21  E1F19   97.525    E1F19          E1G19
     **5 │ 2019-01-31  E1G19  98   E1F19          E1G19**

it has rows by Ticker and Date. For each date, I would have a front and a second Contract. I have to be able to calculate the difference between the Close price for a date for Front and Second contract. for instance, E1F19 has a frontContract of E1F19 and E1G19. the difference for 2019-01-31 would be 97.5275 - 98.

I have tried, creating it as a Matrix and using ShuftedArrays to lag and then calculating the difference and then taking it back to a Dataframe. I am sure there is a better way of doing this.

Can someone provide me with some pointers please?

thanks
Roh

Why would it be the case? Why not 98 - 97.5275? Also what would it be if you had one observation for a given day, or more than 2 observations?

it is not clear what you are looking for. But could such a thing be useful to you?

julia> innerjoin(df,df[:,[:Ticker,:Close]], on=:secondCdContract=>:Ticker,makeunique=true)
5×6 DataFrame
 Row │ Date        Ticker   Close    frontContract  secondCdContract  Close_1 
     │ String15    String7  Float64  String7        String7           Float64 
─────┼────────────────────────────────────────────────────────────────────────
   1 │ 31.01.2019  E1F19    97.5275  E1F19          E1G19                98.0
   2 │ 30.01.2019  E1F19    97.53    E1F19          E1G19                98.0
   3 │ 29.01.2019  E1F19    97.53    E1F19          E1G19                98.0
   4 │ 21.01.2019  E1F19    97.525   E1F19          E1G19                98.0
   5 │ 31.01.2019  E1G19    98.0     E1F19          E1G19                98.0

I would only have one observation per day. It could be that I don’t have any observation at all, in that case I set the spread to NaN

Thanks

Yeah I have tried that as well. This works with a left join. I am wondering if there is another way to achieve this, am using it as a learning opportunity

julia> innerjoin(df,df[:,[:Date,:Ticker,:Close]], on=[:secondCdContract=>:Ticker, :Date],makeunique=true)
2×6 DataFrame
 Row │ Date        Ticker   Close    frontContract  secondCdContract  Close_1 
     │ String15    String7  Float64  String7        String7           Float64 
─────┼────────────────────────────────────────────────────────────────────────
   1 │ 31.01.2019  E1F19    97.5275  E1F19          E1G19                98.0
   2 │ 31.01.2019  E1G19    98.0     E1F19          E1G19                98.0

1 Like

two other proposals, since you have not explained in detail the desired result

julia> leftjoin(df,df[:,1:3], on=[:secondCdContract=>:Ticker, :Date],makeunique=true)  
5×6 DataFrame
 Row │ Date        Ticker   Close    frontContract  secondCdContract  Close_1   
     │ String15    String7  Float64  String7        String7           Float64?  
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ 31.01.2019  E1F19    97.5275  E1F19          E1G19                  98.0        
   2 │ 31.01.2019  E1G19    98.0     E1F19          E1G19                  98.0        
   3 │ 30.01.2019  E1F19    97.53    E1F19          E1G19             missing
   4 │ 29.01.2019  E1F19    97.53    E1F19          E1G19             missing
   5 │ 21.01.2019  E1F19    97.525   E1F19          E1G19             missing

julia> combine(groupby(df,:Date), :Close=>diff=>:spread)
1×2 DataFrame
 Row │ Date        spread  
     │ String15    Float64
─────┼─────────────────────
   1 │ 31.01.2019   0.4725
2 Likes

Yes - this is what I would would say is a good solution assuming that the source df is properly pre-sorted (I would assume it is).