HighFrequencyCovariance.jl - Algorithms for efficiently estimating covariance matrices with high frequency financial data

HighFrequencyCovariance.jl is a Julia package designed to utilize high frequency financial data to estimate covariances between the prices of assets. High frequency data can be used to more accurately estimate covariance than lower frequency data. If covariance is estimated over a long period such as an hour there are few observations of price movements per trading day. On the other hand if price movements can be measured in the space of a few seconds then a daily covariance matrix can be estimated using thousands of observations.

The problem with using high frequency data is that there are number of statistical issues which mean using the basic covariance estimation method leads to a highly biased covariance matrix. The first is we will observe updated prices (that we typically get from realised trades or updated quotes) for different assets asynchronously and at different frequencies. If a short duration between returns is used it is possible that some assets might not have an updated price yet leading to a downwards bias in estimated covariance. A second difficulty is that there is typically ``microstructure’’ noise reflecting small fluctuations in recorded prices as a result of how the order book works.

There is a growing statistics literature coming up with algorithms for overcoming these issues and estimating unbiased covariance matrices. Until now however there were few open source implementations of the these algorithms (I only know of highfrequency in R).


The package should be easy to use. First we load our data. I don’t have financial data on my home pc so we can Monte Carlo some (using a generator built in the package):

using HighFrequencyCovariance
ts, true_covar, micro_noise, update_rates = generate_random_path(4,
                      40000; assets = assets, vols = [0.02,0.03,0.04,0.05])

The true covariance matrix, microstructure noise (for each asset) and rate of price updates are stored in true_covar, micro_noise, update_rates. The financial data is in ts1 which is a SortedDataFrame containing price updates of the four assets. A SortedDataFrame is a wrapper on a DataFrame which stores the column names corresponding to price, time, assetname (it also presorts and indexes the data for speed reasons).

If you have a DataFrame of actual price data (below called “data” with time, asset, price columns called “time”, “asset”, “price”) you can turn it into a SortedDataFrame using the code:

ts = SortedDataFrame(data, :time, :asset, :price)

Now the data is loaded you can make a CovarianceMatrix using the preaveraging method or the BNHLS method or the spectral method with:

preav_estimate     = preaveraged_covariance(ts)
bnhls_estimate     = bnhls_covariance(ts)
spectral_estimate  = spectral_covariance(ts)

You can also take the weighted average of multiple CovarianceMatrixs with:

combined = combine_covariance_matrices([preav_estimate, bnhls_estimate,
                                        spectral_estimate], [2,1,1])

A CovarianceMatrix struct contains the correlation matrix and volatilities separately. If you want the actual covariance matrix (over some duration specified in the time units in your SortedDataFrame) you can do these by:

duration = 100
cov = covariance(combined, duration)

There are also 2 volatility estimation methods, 5 covariance estimation methods and 4 matrix regularisation methods and a bunch of helper functions. They are described in the documentation.

Hope this is useful. Please let me know if anyone has any suggestions or spots any bugs.


This is great! I will definitely be making use of this package later this year.

1 Like