Problem in understanding how to load data to estimate covariance matrix


My aim is to estimate the covariance matrix using 10 Forex high-frequency data. I found the Julia package, HighFrequencyCovariance, HighFrequencyCovariance.jl - Algorithms for efficiently estimating covariance matrices with high frequency financial data, showing the different covariance estimators I might use. However, I am unable to understand how to put my dataframe so as I can proceed further in my analysis.
For example, I don’t understand the following codes:
using HighFrequencyCovariance
using DataFrames
df = DataFrame(:stock => [:A,:B,:A,:A,:A,:B,:A,:B,:B], :time => [1,2,3,4,5,5,6,7,8],
:logprice => [1.01,2.0,1.011,1.02,1.011,2.2,1.0001,2.2,2.3])
ts = SortedDataFrame(df, :time, :stock, :logprice)

I will be using 1-min return. The data structure of say, EUR/AUD has the headers, local time, Open, High, Low, Close prices and Volume.

I know how to import each dataset in Julia (have 10 datasets since I will be using 10 Forex). However, I don’t understand how to proceed and what will the time and log prices represent in this context.

Also, are there better covariance estimators (where the codes are already available) I can use, to work with the HF data?

Can anyone please enlighten me on this issue?

Please format your code inside a code block. There is a post in the forum explaining how you can do it.

Also, check it may suit your needs.


For basically any covariance estimation you will need to get all of the data (ie for each FX pair) loaded into a dataframe. You will also need a common measurement of time (if all you have is exchange local time). So I would do something like:

using CSV
function load_and_wrangle_data(path, name)
        dd =
        dd[!,:name] .= name
dd = load_and_wrangle_data(path_to_EURAUD, :EURAUD)
append!(dd, load_and_wrangle_data(path_to_EURUSD, :EURUSD))
# More lines loading the data

# Make a column for time. This might be the number of seconds since your first observation. Or any reasonable time measurement given your data. Call this column :time.
# You say you have local time. You will need to covert all times to the same timezone.

ts = SortedDataFrame(df, :time, :name, :close)

1 Like