Querying WRDS Data using Julia

I have written a short tutorial about querying WRDS data using Julia. A user only needs to input her username and password to work with WRDS remotely in Julia. Basically, I translated the existing tutorial about querying WRDS data using R to Julia.

https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-r/querying-wrds-data-r/

WRDS is the standard data source used by many finance and accounting departments. If you have a wrds account, please go ahead and try my tutorial. I have asked wrds to add this tutorial to their website. They will probably add more detailed instructions. Basically, a user needs to understand some basic PostgreSQL commands to work with WRDS remotely. On the Julia side, a user needs to know how to work with DataFrame. In addition, I personally use FixedEffectModels.jl for running regressions, and I highly recommend this package (cannot say enough good things about it).

#Querying WRDS Data using Julia

#load required packages
################################################################
using LibPQ, Tables, DataFrames 
################################################################


#1. connect to wrds
wrds = LibPQ.Connection("""
                        host = wrds-pgdata.wharton.upenn.edu 
                        port = 9737
                        user='username' 
                        password='password'
                        sslmode = 'require' dbname = wrds
                        """)


#2. determine the data libraries available at WRDS:
res = execute(wrds, "select distinct table_schema
                     from information_schema.tables
                     where table_type ='VIEW'
                     or table_type ='FOREIGN TABLE'
                     order by table_schema")

data = DataFrame(columntable(res))
res = 0
data

#3. determine the datasets within a given library:
res = execute(wrds, "select distinct table_name
                    from information_schema.columns
                    where table_schema='crsp'
                    order by table_name")

data = DataFrame(columntable(res))
res = 0
data


#4. determine the variables (column names) within a given dataset:
res = execute(wrds, "select column_name
                    from information_schema.columns
                    where table_schema='crsp'
                    and table_name='dsf'
                    order by column_name")

data = DataFrame(columntable(res))
res = 0
data

#5. To query the crsp.dsf dataset:
res = execute(wrds, "select * from crsp.dsf
                limit 10")
data = DataFrame(columntable(res))
res = 0
data

#6. To query specific variables from the crsp.dsf dataset:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi from crsp.dsf
                limit 10")
data = DataFrame(columntable(res))
res = 0
data
                
#7. To query variables that meet certain criteria:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi
                   from crsp.dsf
                   where askhi > 2500
                   and bidlo < 2000
                   limit 10")
data = DataFrame(columntable(res))
res = 0
data
                    

#8. To query by date:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi
                   from crsp.dsf
                   where date = '2013-01-04'")
data = DataFrame(columntable(res))
res = 0
data
                   

#9. To query by date range:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi
                   from crsp.dsf
                   where date between '2013-01-07'
                   and '2013-01-08'")
data = DataFrame(columntable(res))
res = 0
data

#10. To query using multiple search criteria:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi
                   from crsp.dsf
                   where date between '1960-01-01'
                   and '1980-01-01'
                   and askhi > 2500
                   and bidlo < 2000")
data = DataFrame(columntable(res))
res = 0
data


#11. To join and query two Compustat datasets:
res = execute(wrds, "select a.gvkey, a.datadate, a.tic,
                   a.conm, a.at, a.lt, b.prccm, b.cshoq
                   from comp.funda a join comp.secm b
                   on a.gvkey = b.gvkey
                   and a.datadate = b.datadate
                   where a.tic = 'IBM'
                   and a.datafmt = 'STD'
                   and a.consol = 'C'
                   and a.indfmt = 'INDL'")
data = DataFrame(columntable(res))
res = 0
data
4 Likes

Very cool! I have not tried this but will give it a shot sometime. I’ve mostly just resorted to downloading the entire SAS tables that I need from WRDS to my local machines and then using SASLib.jl to load them in to DataFrames. It works for basically everything except the big stuff like the TAQ data.

the problem with downloading via web is that if you need to update your data or code, you may have to download your data again.

also, the data you download basically is a merged data, which means it may take a lot of space.

in addition, the data structure in WRDS makes more sense, while the web interface can be really confusing sometimes.

For high resolution financial data like TAQ, there is currently no cheap option to work with them. In academia, people use WRDS cloud, but it is very inefficient, you have to submit your code and wait for more than 1 day and sometimes more than 1 week to see the results. if something goes wrong, you have to start over.

In finance industry, kdb+ is used to handle large time-series data, but it is very expensive and I haven’t seen anyone in academia use it (not sure if they offer academic license).

1 Like

Pretty slick. It’d be cool if we could get WRDS to put this on their site.

I haven’t seen anyone use kdb+ in academic, either. I don’t think they offer an academic license, but they do have a fairly restrictive personal license for free that probably would not be sufficient for TAQ.

Just heard back from wrds:

Thank you so much for this. Right now, we do not officially support Julia as we do not have much internal experience with it. We will look into the tutorial you provided as well as the packages that are available and work on developing instructions for our end users.

hopefully they can get this done ASAP