Problem in understanding how to load data to estimate covariance matrix

0

My aim is to estimate the covariance matrix using 10 Forex high-frequency data. I found the Julia package, HighFrequencyCovariance, HighFrequencyCovariance.jl - Algorithms for efficiently estimating covariance matrices with high frequency financial data, showing the different covariance estimators I might use. However, I am unable to understand how to put my dataframe so as I can proceed further in my analysis.
For example, I don’t understand the following codes:
using HighFrequencyCovariance
using DataFrames
df = DataFrame(:stock => [:A,:B,:A,:A,:A,:B,:A,:B,:B], :time => [1,2,3,4,5,5,6,7,8],
:logprice => [1.01,2.0,1.011,1.02,1.011,2.2,1.0001,2.2,2.3])
ts = SortedDataFrame(df, :time, :stock, :logprice)

I will be using 1-min return. The data structure of say, EUR/AUD has the headers, local time, Open, High, Low, Close prices and Volume.

I know how to import each dataset in Julia (have 10 datasets since I will be using 10 Forex). However, I don’t understand how to proceed and what will the time and log prices represent in this context.

Also, are there better covariance estimators (where the codes are already available) I can use, to work with the HF data?

Can anyone please enlighten me on this issue?

Please format your code inside a code block. There is a post in the forum explaining how you can do it.

Also, check https://github.com/joshday/OnlineStats.jl it may suit your needs.

2 Likes

For basically any covariance estimation you will need to get all of the data (ie for each FX pair) loaded into a dataframe. You will also need a common measurement of time (if all you have is exchange local time). So I would do something like:

using CSV
function load_and_wrangle_data(path, name)
        dd = CSV.read(path)
        dd[!,:name] .= name
       return(dd)
end
dd = load_and_wrangle_data(path_to_EURAUD, :EURAUD)
append!(dd, load_and_wrangle_data(path_to_EURUSD, :EURUSD))
# More lines loading the data

# Make a column for time. This might be the number of seconds since your first observation. Or any reasonable time measurement given your data. Call this column :time.
# You say you have local time. You will need to covert all times to the same timezone.

ts = SortedDataFrame(df, :time, :name, :close)

1 Like