hi
I have a dataframe as below
Row │ Date Ticker Close frontContract secondContract
│ Date String Float64 String String
───────┼─────────────────────────────────────────────────────────────
**1 │ 2019-01-31 E1F19 97.5275 E1F19 E1G19**
2 │ 2019-01-30 E1F19 97.53 E1F19 E1G19
3 │ 2019-01-29 E1F19 97.53 E1F19 E1G19
4 │ 2019-01-21 E1F19 97.525 E1F19 E1G19
**5 │ 2019-01-31 E1G19 98 E1F19 E1G19**
it has rows by Ticker and Date. For each date, I would have a front and a second Contract. I have to be able to calculate the difference between the Close price for a date for Front and Second contract. for instance, E1F19 has a frontContract of E1F19 and E1G19. the difference for 2019-01-31 would be 97.5275 - 98.
I have tried, creating it as a Matrix and using ShuftedArrays to lag and then calculating the difference and then taking it back to a Dataframe. I am sure there is a better way of doing this.
Can someone provide me with some pointers please?
thanks
Roh
Why would it be the case? Why not 98 - 97.5275
? Also what would it be if you had one observation for a given day, or more than 2 observations?
it is not clear what you are looking for. But could such a thing be useful to you?
julia> innerjoin(df,df[:,[:Ticker,:Close]], on=:secondCdContract=>:Ticker,makeunique=true)
5×6 DataFrame
Row │ Date Ticker Close frontContract secondCdContract Close_1
│ String15 String7 Float64 String7 String7 Float64
─────┼────────────────────────────────────────────────────────────────────────
1 │ 31.01.2019 E1F19 97.5275 E1F19 E1G19 98.0
2 │ 30.01.2019 E1F19 97.53 E1F19 E1G19 98.0
3 │ 29.01.2019 E1F19 97.53 E1F19 E1G19 98.0
4 │ 21.01.2019 E1F19 97.525 E1F19 E1G19 98.0
5 │ 31.01.2019 E1G19 98.0 E1F19 E1G19 98.0
I would only have one observation per day. It could be that I don’t have any observation at all, in that case I set the spread to NaN
Thanks
Yeah I have tried that as well. This works with a left join. I am wondering if there is another way to achieve this, am using it as a learning opportunity
julia> innerjoin(df,df[:,[:Date,:Ticker,:Close]], on=[:secondCdContract=>:Ticker, :Date],makeunique=true)
2×6 DataFrame
Row │ Date Ticker Close frontContract secondCdContract Close_1
│ String15 String7 Float64 String7 String7 Float64
─────┼────────────────────────────────────────────────────────────────────────
1 │ 31.01.2019 E1F19 97.5275 E1F19 E1G19 98.0
2 │ 31.01.2019 E1G19 98.0 E1F19 E1G19 98.0
1 Like
two other proposals, since you have not explained in detail the desired result
julia> leftjoin(df,df[:,1:3], on=[:secondCdContract=>:Ticker, :Date],makeunique=true)
5×6 DataFrame
Row │ Date Ticker Close frontContract secondCdContract Close_1
│ String15 String7 Float64 String7 String7 Float64?
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 31.01.2019 E1F19 97.5275 E1F19 E1G19 98.0
2 │ 31.01.2019 E1G19 98.0 E1F19 E1G19 98.0
3 │ 30.01.2019 E1F19 97.53 E1F19 E1G19 missing
4 │ 29.01.2019 E1F19 97.53 E1F19 E1G19 missing
5 │ 21.01.2019 E1F19 97.525 E1F19 E1G19 missing
julia> combine(groupby(df,:Date), :Close=>diff=>:spread)
1×2 DataFrame
Row │ Date spread
│ String15 Float64
─────┼─────────────────────
1 │ 31.01.2019 0.4725
2 Likes
Yes - this is what I would would say is a good solution assuming that the source df
is properly pre-sorted (I would assume it is).