Building a Rolling Correlation Function

tnederlof · September 12, 2018, 2:49pm

As I often need to run a rolling correlation between two series I wanted to implement my own function and got most of the way there but I am running into a few issues I have been unsuccessful in solving.

EDIT: The function now works with the explicit return
EDIT2: The function now works with missing values

This function currently either outputs the correlation number or if there is missing values puts a 0. How can I go about instead detecting if a missing value is in df_sub and putting in the output vector a missing instead of 0?

using Pkg
using DataFrames

function roll_cor(df::DataFrame; n::Int64=126)::Array{Union{Float64, Missing}}
  cor_vec = Vector{Union{Float64, Missing}}(undef, (size(df)[1]))
  for i in n:size(df)[1]
    df_sub = df[(i-(n-1)):i,1:2]
    if !any(colwise(x -> any(ismissing.(x)), df_sub))
      cor_vec[i] = cor(convert(Array, df_sub))[1,2]
    else
      cor_vec[i] = missing
    end
  end
  return cor_vec
end

test = DataFrame(x = rand(2000), y = rand(2000))
roll_cor(test, n=126)

tbeason · September 12, 2018, 3:20pm

Have you tried using RollingFunctions.jl? I’ve used it with pretty good success. Even if it doesn’t directly take care of things for you, it is a good place to build from.

tnederlof · September 12, 2018, 3:23pm

Yes, I will look through the code there again though. Since there wasnt a correlation function in there as far as I could tell I figured building my own would help me learn. Its a good point that reviewing the structures used in that package could help.

tbeason · September 12, 2018, 3:35pm

You can also pass in your own function. So, you could do something like

rolling(x->cor(x[:,1],x[:,2]),x,n)

where x is a Tx2 matrix of your two series.

Additionally the Missings.jl package can help you propagate missing values, although I think cor automatically returns missing in this case on Julia 0.7 and above.

rdeits · September 12, 2018, 3:41pm

The function roll_cor has a return-type annotation ::Array{Float64}, but it doesn’t actually return anything. You need return cor_vec or just cor_vec as the last line of your function. I suspect that is the convert error you’re seeing (although it will be easier to help if you can post the full error message).

tnederlof · September 12, 2018, 4:05pm

Thank you all fo the help. rdelts was right, I had to return the correlation array, the function was trying to convert some other part as the output since I omitted return.

Now that the function works properly for numbers (either puts a 0 or the correct correlation number), I am stuck trying to instead put either a missing or the correct correlation number. Currently I am using zeros to preallocate a vector of a certain length, is there an easy way to create a missing/Float64 vector of a certain length and then just populate that?

rdeits · September 12, 2018, 4:31pm

Sure, if you want a vector of either Float64 or Missing with length N, you can just do:

Vector{Union{Float64, Missing}}(undef, N)

where the undef tells Julia not to try to set each element of the array yet (since you’re going to set them yourself).

tnederlof · September 12, 2018, 4:37pm

Thank you that is exactly what I was searching for. So if I am understanding this right… undef tells Julia that each element may be a missing or a Float64 but dont worry about that until one of the two is set?

I updated the code in my original post to show what I am using now.

One thing I noticed is the speed is reduced roughly 4x slower (although still pretty fast in an absolute sense), is this just the realities of using a Union{Float64, Missing} versus just using Float64, where the compiler knows it only can be a float?

rdeits · September 12, 2018, 4:57pm

Almost, but not exactly. There are two steps involved in constructing a vector: (1) allocating space in memory and (2) actually setting that memory to some value. Vector{T}(undef, N) tells Julia to only do step (1), which saves some time. That means that until you actually set each element of the vector, the data it contains is totally arbitrary and could be zeros or garbage or anything in between. Note how I get different contents each time I run this, based on whatever leftover data happened to be in memory:

julia> Vector{Float64}(undef, 2)
2-element Array{Float64,1}:
 6.92131585940106e-310
 0.0                  

julia> Vector{Float64}(undef, 2)
2-element Array{Float64,1}:
 6.92131532920064e-310
 6.9213154712449e-310 

julia> Vector{Float64}(undef, 2)
2-element Array{Float64,1}:
 5.0e-324
 0.0

From this, you might be able to tell that there’s a bug in your code above. You are allocating an undef vector for cor_vec, but you’re only setting elements n:size(df)[1]. That means that the first 1:(n-1) elements will have garbage contents. To actually initialize those values to zero, you can do:

cor_vec = Vector{....
cor_vec .= 0  # fill with zeros

or

cor_vec = Vector{...
cor_vec .= missing # fill with missing

As a shortcut for the above, you can also just use zeros as you did before:

julia> zeros(Union{Float64, Missing}, 5)
5-element Array{Union{Missing, Float64},1}:
 0.0
 0.0
 0.0
 0.0
 0.0

As a further note, the undef argument is new in Julia 0.7, but the behavior is not. Previously, Vector{T}(N) did exactly what Vector{T}(undef, N) does now, it was just less obvious. The undef serves as a marker to be careful (and remember to actually set each element of the vector after you’ve constructed it), and also as a slot to put other kinds of initialization behaviors. For example, we can now do:

julia> Matrix{Float64}(undef, 2, 2)
2×2 Array{Float64,2}:
 6.92132e-310  6.92132e-310
 6.92132e-310  0.0

to allocate a Matrix without setting its contents, or we can use I instead of undef to do:

julia> using LinearAlgebra

julia> Matrix{Float64}(I, 2, 2)
2×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0

and create a Matrix and set its contents to be an identity matrix.

tnederlof · September 12, 2018, 5:13pm

This was really helpful for me to learn, thank you for spending the time creating the examples too. I read through the docs but these points didnt quite stick in my head.

Topic		Replies	Views
Skipmissing no working in cor function New to Julia question	5	1053	November 11, 2021
First impression of DataFrames.jl New to Julia dataframes	4	1900	November 8, 2020
[ANN] RollingWindowArrays.jl - flexible and efficient rolling window operations Package Announcements arrays , views	4	561	December 25, 2024
Rolling/running functions with complex output on multiple variables Data	6	229	March 9, 2023
Covariance from DataFrame or TimeArray New to Julia statistics , dataframes , finance	17	2014	October 24, 2021

Building a Rolling Correlation Function

Related topics