Yes, dropmissing
should probably be generic in DataAPI. TSFrames should not implement skipmissing
in the way it currently is.
- TSFrames.jl has DataFrames.jl as a depencency, so it can add a method to
dropmissing
without a problem. Preferably with the same signaturedropmissing(table::T, cols=:; view::Bool=false, disallowmissing::Bool=!view)
but if e.g. some kwargs are hard to support then this does not have to be strictly followed. -
skipmissing
has a different API as it RETAINS indices from the parent:
julia> x = [1, 2, missing, 4, 5, missing]
6-element Vector{Union{Missing, Int64}}:
1
2
missing
4
5
missing
julia> x |> skipmissing |> eachindex |> collect
4-element Vector{Int64}:
1
2
4
5
@chiraganand - in general my recommendation would be that if you add new features to TSFrames.jl (especially features that are not unique to the package) you could consider making a small announcement at #data channel on Slack so that people can respond to it before you make a release. This is the procedure we try to follow in DataFrames.jl and it works reasonably well (people tend to watch #data channel on Slack instead of watching all the repositories on GitHub that might interest them as on GitHub there is too much going on in parallel and it is hard to track).
Actually, TSFrames does not have a specific implementation for skipmissing
.
Yes, this should be easily possible to do in TSFrames.
Okay, sure.
@mdogan Have you tried using dropmissing
with ts.coredata
? The TSFrame.coredata
property stores the underlying DataFrame and this is how most of the TSFrames methods implement wrappers over DataFrames functionality. Till the time TSFrames gets itβs own dropmissing()
this could be a good way to achieve what you are trying to do.
@chiraganand, thanks very much for the follow-up. Yes, that way, it works, but as I highlighted at the beginning, I will be using TSFrames within my packageβs functions as a main data object, and dropmissing
does not suit the flow of the behavior of my package.
using Dates
using TSFrames
using DataFrames
dates = Date(2021, 12, 31):Month(1):Date(2022, 07, 31);
TSLA = [352.26,312.24,missing,359.2,290.25,252.75,224.47,297.15];
NFLX = [602.44,427.14,394.52,missing,190.36,197.44,174.87,224.9];
MSFT = [336.32,310.98,298.79,308.31,277.52,271.87,missing,280.74];
# by the way, you should add the below line somewhere at the top of the TSFrames documentation :) It's easy to construct a TSFrame object, but it took quite some time to figure out this functionality.
prices_ts = TSFrame([TSLA NFLX MSFT], dates, colnames=[:TSLA, :NFLX, :MSFT])
julia> returns = pctchange(prices_ts)
8Γ3 TSFrame with Date Index
Index TSLA NFLX MSFT
Date Float64? Float64? Float64?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2021-12-31 missing missing missing
2022-01-31 -0.113609 -0.290983 -0.0753449
2022-02-28 missing -0.0763684 -0.0391987
2022-03-31 missing missing 0.0318618
2022-04-30 -0.191954 missing -0.099867
2022-05-31 -0.129199 0.0371927 -0.0203589
2022-06-30 -0.111889 -0.114313 missing
2022-07-31 0.323785 0.286098 missing
julia> dropmissing(returns.coredata)
2Γ4 DataFrame
Row β Index TSLA NFLX MSFT
β Date Float64 Float64 Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
1 β 2022-01-31 -0.113609 -0.290983 -0.0753449
2 β 2022-05-31 -0.129199 0.0371927 -0.0203589
# I force my functions to use the TSFrame object, therefore:
TSFrame(dropmissing(returns.coredata))
2Γ3 TSFrame with Date Index
Index TSLA NFLX MSFT
Date Float64 Float64 Float64
βββββββββββββββββββββββββββββββββββββββββββββββ
2022-01-31 -0.113609 -0.290983 -0.0753449
2022-05-31 -0.129199 0.0371927 -0.0203589
At this stage, Iβd like to keep the dependencies at a minimum. In the documentation, I will specify that users can drop missing values using dropmissing
in the way described above.
skipmissing
is normally problematic and does not work with many functions, but If I need to skip missing values, Iβll use the syntax below within my libraryβs functions. This is an alternative way for me to make column-wise operations and skip Tables.jl as a dependency.
By the way, you gave me this idea in one of your replies; I just removed Table.jl and used Matrix instead.
m = [ ]
for col in collect.(skipmissing(eachcol(Matrix(ts))))
println(col)
push!(m, mean(skipmissing(col)))
end
m
Indeed, it is better than dropping the lines with missing values in terms of the analysis conducted. However, it would be great to have TSFrames.jl specific dropmissing
equivalent so that users can easily drop missing values if needed before passing them as input to the functions.
This is still not doing what you think it is. You are missing a broadcast. Do skipmissing.(eachcol(...)
instead.
Also, you should be able to do eachcol
on ts
directly. Did you try that?
This does not work.
julia> eachcol(prices_ts)
ERROR: MethodError: no method matching eachcol(::TSFrame)
Closest candidates are:
eachcol(::AbstractVecOrMat) at abstractarraymath.jl:583
eachcol(::DataFrames.AbstractDataFrame) at C:\Users\ploot\.julia\packages\DataFrames\dgZn3\src\abstractdataframe\iteration.jl:175
Stacktrace:
[1] top-level scope
@ REPL[1]:1
It does not work if I donβt use the collect. It does not skip the missing
. Unfortunately, missing
is weakest point of Julia so far, I observed.
julia> mean.(skipmissing(eachcol(Matrix(prices_ts))))
3-element Vector{Missing}:
missing
missing
missing
julia> mean.(collect.(skipmissing(eachcol(Matrix(prices_ts)))))
3-element Vector{Missing}:
missing
missing
missing
This is the only way it worked:
julia> m = []
Any[]
julia> for col in collect.(skipmissing(eachcol(Matrix(prices_ts))))
println(col)
push!(m, mean(skipmissing(col)))
end
Union{Missing, Float64}[352.26, 312.24, missing, 359.2, 290.25, 252.75, 224.47, 297.15]
Union{Missing, Float64}[602.44, 427.14, 394.52, missing, 190.36, 197.44, 174.87, 224.9]
Union{Missing, Float64}[336.32, 310.98, 298.79, 308.31, 277.52, 271.87, missing, 280.74]
julia> m
3-element Vector{Any}:
298.3314285714286
315.95285714285717
297.78999999999996
You misinterpreted what I wrote. You are doing skipmissing(eachcol(...))
. You should do skipmissing.(eachcol(...))
. Notice the .
julia> X = [missing 2; 3 missing];
julia> for col in collect.(skipmissing.(eachcol(X)))
println(col)
push!(m, mean(col))
end
[3]
[2]
But also, mean
βjust worksβ with skipmissing
.
julia> mean.(skipmissing.(eachcol(X)))
2-element Vector{Float64}:
3.0
2.0
I understand other functions do not, though, and we are actively working on making it easier to accomplish these tasks.
@pdeffebach, Thank you so much for your answer.
Sorry, I missed that. I tried, and the final output is the same, because, I think mean() function was skipping the missing
not this one skipmissing(eachcol(Matrix(prices_ts)))
.
julia> m = []
Any[]
julia> for col in collect.(skipmissing.(eachcol(Matrix(prices_ts))))
println(col)
push!(m, mean(col))
end
[352.26, 312.24, 359.2, 290.25, 252.75, 224.47, 297.15]
[602.44, 427.14, 394.52, 190.36, 197.44, 174.87, 224.9]
[336.32, 310.98, 298.79, 308.31, 277.52, 271.87, 280.74]
julia> m
3-element Vector{Any}:
298.3314285714286
315.95285714285717
297.78999999999996
julia> m = []
Any[]
julia> for col in collect.(skipmissing(eachcol(Matrix(prices_ts))))
println(col)
push!(m, mean(skipmissing(col)))
end
Union{Missing, Float64}[352.26, 312.24, missing, 359.2, 290.25, 252.75, 224.47, 297.15]
Union{Missing, Float64}[602.44, 427.14, 394.52, missing, 190.36, 197.44, 174.87, 224.9]
Union{Missing, Float64}[336.32, 310.98, 298.79, 308.31, 277.52, 271.87, missing, 280.74]
julia> m
3-element Vector{Any}:
298.3314285714286
315.95285714285717
297.78999999999996
A big thanks for that. The problems are not because of the TSFrames package but because of how Julia works. Nevertheless, Iβm looking forward to the next releases of TSFrames.
@chiraganand @pdeffebach, thanks very much for your answers so far. I wonder whether there is a TSFrame (or some other) function to obtain the data frequency of a TSFrame object.
I want to get something like daily if data is daily, monthly if data is monthly, and so on.
Thank you in advance
@chiraganand, I know that itβs kinda late, but I wanted to share something meaningful before sharing it with you and others. I have already released my package, PortfolioAnalytics.jl, which aims to be a tool for quantitative portfolio analyticsβindeed made a second release.
All functions accept only the TSFrame object as an input - I donβt want to rely on unmaintained packages. Iβd be very happy to collaborate, and I sincerely thank you and others in xKDR for the TSFrames package and all your answers so far.
The package is under heavy development, but even now possible to derive returns, portfolio returns, value at risk, expected shortfall, minimum-variance or maximum-sharpe portfolio optimization, and a few others. Hopefully, the packageβs functionality will improve with further releases, either minor or major. I plan to add new functionalities every month until it becomes a mature library.
doganmehmet/PortfolioAnalytics.jl (github.com)
I am open to any feedback, feature request, and collaboration.
Thanks
No, there is no such functionality as such. Though, there is an isRegular()
function which tells whether observations in a series are equally spaced or not for a given unit.