Announcing TSFrames.jl (formerly TSx) v0.1.0: A timeseries data manipulation package based on DataFrames

Yes, dropmissing should probably be generic in DataAPI. TSFrames should not implement skipmissing in the way it currently is.

  1. TSFrames.jl has DataFrames.jl as a depencency, so it can add a method to dropmissing without a problem. Preferably with the same signature dropmissing(table::T, cols=:; view::Bool=false, disallowmissing::Bool=!view) but if e.g. some kwargs are hard to support then this does not have to be strictly followed.
  2. skipmissing has a different API as it RETAINS indices from the parent:
julia> x = [1, 2, missing, 4, 5, missing]
6-element Vector{Union{Missing, Int64}}:
 1
 2
  missing
 4
 5
  missing

julia> x |> skipmissing |> eachindex |> collect
4-element Vector{Int64}:
 1
 2
 4
 5

@chiraganand - in general my recommendation would be that if you add new features to TSFrames.jl (especially features that are not unique to the package) you could consider making a small announcement at #data channel on Slack so that people can respond to it before you make a release. This is the procedure we try to follow in DataFrames.jl and it works reasonably well (people tend to watch #data channel on Slack instead of watching all the repositories on GitHub that might interest them as on GitHub there is too much going on in parallel and it is hard to track).

2 Likes

Actually, TSFrames does not have a specific implementation for skipmissing.

Yes, this should be easily possible to do in TSFrames.

Okay, sure.

1 Like

@mdogan Have you tried using dropmissing with ts.coredata? The TSFrame.coredata property stores the underlying DataFrame and this is how most of the TSFrames methods implement wrappers over DataFrames functionality. Till the time TSFrames gets it’s own dropmissing() this could be a good way to achieve what you are trying to do.

@chiraganand, thanks very much for the follow-up. Yes, that way, it works, but as I highlighted at the beginning, I will be using TSFrames within my package’s functions as a main data object, and dropmissing does not suit the flow of the behavior of my package.

using Dates
using TSFrames
using DataFrames

dates = Date(2021, 12, 31):Month(1):Date(2022, 07, 31);
TSLA = [352.26,312.24,missing,359.2,290.25,252.75,224.47,297.15];
NFLX = [602.44,427.14,394.52,missing,190.36,197.44,174.87,224.9];
MSFT = [336.32,310.98,298.79,308.31,277.52,271.87,missing,280.74];

# by the way, you should add the below line somewhere at the top of the TSFrames documentation :) It's easy to construct a TSFrame object, but it took quite some time to figure out this functionality.
prices_ts = TSFrame([TSLA NFLX MSFT], dates, colnames=[:TSLA, :NFLX, :MSFT])

julia> returns = pctchange(prices_ts)
8Γ—3 TSFrame with Date Index
 Index       TSLA            NFLX             MSFT
 Date        Float64?        Float64?         Float64?        
──────────────────────────────────────────────────────────────
 2021-12-31  missing         missing          missing
 2022-01-31       -0.113609       -0.290983        -0.0753449
 2022-02-28  missing              -0.0763684       -0.0391987
 2022-03-31  missing         missing                0.0318618
 2022-04-30       -0.191954  missing               -0.099867
 2022-05-31       -0.129199        0.0371927       -0.0203589
 2022-06-30       -0.111889       -0.114313   missing
 2022-07-31        0.323785        0.286098   missing

julia> dropmissing(returns.coredata)
2Γ—4 DataFrame
 Row β”‚ Index       TSLA       NFLX        MSFT       
     β”‚ Date        Float64    Float64     Float64    
─────┼───────────────────────────────────────────────
   1 β”‚ 2022-01-31  -0.113609  -0.290983   -0.0753449
   2 β”‚ 2022-05-31  -0.129199   0.0371927  -0.0203589

#  I force my functions to use the TSFrame object, therefore:
TSFrame(dropmissing(returns.coredata))
2Γ—3 TSFrame with Date Index
 Index       TSLA       NFLX        MSFT       
 Date        Float64    Float64     Float64    
───────────────────────────────────────────────
 2022-01-31  -0.113609  -0.290983   -0.0753449
 2022-05-31  -0.129199   0.0371927  -0.0203589


At this stage, I’d like to keep the dependencies at a minimum. In the documentation, I will specify that users can drop missing values using dropmissing in the way described above.

skipmissing is normally problematic and does not work with many functions, but If I need to skip missing values, I’ll use the syntax below within my library’s functions. This is an alternative way for me to make column-wise operations and skip Tables.jl as a dependency.

By the way, you gave me this idea in one of your replies; I just removed Table.jl and used Matrix instead.

m = [ ]
for col in collect.(skipmissing(eachcol(Matrix(ts))))
    println(col)
    push!(m, mean(skipmissing(col)))
end
m

Indeed, it is better than dropping the lines with missing values in terms of the analysis conducted. However, it would be great to have TSFrames.jl specific dropmissing equivalent so that users can easily drop missing values if needed before passing them as input to the functions.

This is still not doing what you think it is. You are missing a broadcast. Do skipmissing.(eachcol(...) instead.

Also, you should be able to do eachcol on ts directly. Did you try that?

This does not work.

julia> eachcol(prices_ts)
ERROR: MethodError: no method matching eachcol(::TSFrame)
Closest candidates are:
  eachcol(::AbstractVecOrMat) at abstractarraymath.jl:583
  eachcol(::DataFrames.AbstractDataFrame) at C:\Users\ploot\.julia\packages\DataFrames\dgZn3\src\abstractdataframe\iteration.jl:175
Stacktrace:
 [1] top-level scope
   @ REPL[1]:1

It does not work if I don’t use the collect. It does not skip the missing. Unfortunately, missing is weakest point of Julia so far, I observed.

julia> mean.(skipmissing(eachcol(Matrix(prices_ts))))
3-element Vector{Missing}:
 missing
 missing
 missing

julia> mean.(collect.(skipmissing(eachcol(Matrix(prices_ts)))))
3-element Vector{Missing}:
 missing
 missing
 missing

This is the only way it worked:

julia> m = []
Any[]

julia> for col in collect.(skipmissing(eachcol(Matrix(prices_ts))))
           println(col)
               push!(m, mean(skipmissing(col)))
       end
Union{Missing, Float64}[352.26, 312.24, missing, 359.2, 290.25, 252.75, 224.47, 297.15]
Union{Missing, Float64}[602.44, 427.14, 394.52, missing, 190.36, 197.44, 174.87, 224.9]
Union{Missing, Float64}[336.32, 310.98, 298.79, 308.31, 277.52, 271.87, missing, 280.74]

julia> m
3-element Vector{Any}:
 298.3314285714286
 315.95285714285717
 297.78999999999996

You misinterpreted what I wrote. You are doing skipmissing(eachcol(...)). You should do skipmissing.(eachcol(...)). Notice the .

julia> X = [missing 2; 3 missing];

julia> for col in collect.(skipmissing.(eachcol(X)))
           println(col)
           push!(m, mean(col))
       end
[3]
[2]

But also, mean β€œjust works” with skipmissing.

julia> mean.(skipmissing.(eachcol(X)))
2-element Vector{Float64}:
 3.0
 2.0

I understand other functions do not, though, and we are actively working on making it easier to accomplish these tasks.

@pdeffebach, Thank you so much for your answer.

Sorry, I missed that. I tried, and the final output is the same, because, I think mean() function was skipping the missing not this one skipmissing(eachcol(Matrix(prices_ts))).

julia> m = []
Any[]

julia> for col in collect.(skipmissing.(eachcol(Matrix(prices_ts))))
           println(col)
               push!(m, mean(col))
       end
[352.26, 312.24, 359.2, 290.25, 252.75, 224.47, 297.15]
[602.44, 427.14, 394.52, 190.36, 197.44, 174.87, 224.9]
[336.32, 310.98, 298.79, 308.31, 277.52, 271.87, 280.74]

julia> m
3-element Vector{Any}:
 298.3314285714286
 315.95285714285717
 297.78999999999996
julia> m = []
Any[]

julia> for col in collect.(skipmissing(eachcol(Matrix(prices_ts))))
           println(col)
           push!(m, mean(skipmissing(col)))
           end
Union{Missing, Float64}[352.26, 312.24, missing, 359.2, 290.25, 252.75, 224.47, 297.15]
Union{Missing, Float64}[602.44, 427.14, 394.52, missing, 190.36, 197.44, 174.87, 224.9]
Union{Missing, Float64}[336.32, 310.98, 298.79, 308.31, 277.52, 271.87, missing, 280.74]

julia> m
3-element Vector{Any}:
 298.3314285714286
 315.95285714285717
 297.78999999999996

A big thanks for that. The problems are not because of the TSFrames package but because of how Julia works. Nevertheless, I’m looking forward to the next releases of TSFrames.

@chiraganand @pdeffebach, thanks very much for your answers so far. I wonder whether there is a TSFrame (or some other) function to obtain the data frequency of a TSFrame object.

I want to get something like daily if data is daily, monthly if data is monthly, and so on.

Thank you in advance

@chiraganand, I know that it’s kinda late, but I wanted to share something meaningful before sharing it with you and others. I have already released my package, PortfolioAnalytics.jl, which aims to be a tool for quantitative portfolio analyticsβ€”indeed made a second release.

All functions accept only the TSFrame object as an input - I don’t want to rely on unmaintained packages. I’d be very happy to collaborate, and I sincerely thank you and others in xKDR for the TSFrames package and all your answers so far.

The package is under heavy development, but even now possible to derive returns, portfolio returns, value at risk, expected shortfall, minimum-variance or maximum-sharpe portfolio optimization, and a few others. Hopefully, the package’s functionality will improve with further releases, either minor or major. I plan to add new functionalities every month until it becomes a mature library.

Announcing PortfolioAnalytics.jl: Tool for Quantitative Portfolio Analytics - Package Announcements / Package announcements - Julia Programming Language (julialang.org)

doganmehmet/PortfolioAnalytics.jl (github.com)

I am open to any feedback, feature request, and collaboration.

Thanks

2 Likes

No, there is no such functionality as such. Though, there is an isRegular() function which tells whether observations in a series are equally spaced or not for a given unit.

1 Like