I have written a short tutorial about querying WRDS data using Julia. A user only needs to input her username and password to work with WRDS remotely in Julia. Basically, I translated the existing tutorial about querying WRDS data using R to Julia.
WRDS is the standard data source used by many finance and accounting departments. If you have a wrds account, please go ahead and try my tutorial. I have asked wrds to add this tutorial to their website. They will probably add more detailed instructions. Basically, a user needs to understand some basic PostgreSQL commands to work with WRDS remotely. On the Julia side, a user needs to know how to work with DataFrame. In addition, I personally use FixedEffectModels.jl for running regressions, and I highly recommend this package (cannot say enough good things about it).
#Querying WRDS Data using Julia
#load required packages
################################################################
using LibPQ, Tables, DataFrames
################################################################
#1. connect to wrds
wrds = LibPQ.Connection("""
host = wrds-pgdata.wharton.upenn.edu
port = 9737
user='username'
password='password'
sslmode = 'require' dbname = wrds
""")
#2. determine the data libraries available at WRDS:
res = execute(wrds, "select distinct table_schema
from information_schema.tables
where table_type ='VIEW'
or table_type ='FOREIGN TABLE'
order by table_schema")
data = DataFrame(columntable(res))
res = 0
data
#3. determine the datasets within a given library:
res = execute(wrds, "select distinct table_name
from information_schema.columns
where table_schema='crsp'
order by table_name")
data = DataFrame(columntable(res))
res = 0
data
#4. determine the variables (column names) within a given dataset:
res = execute(wrds, "select column_name
from information_schema.columns
where table_schema='crsp'
and table_name='dsf'
order by column_name")
data = DataFrame(columntable(res))
res = 0
data
#5. To query the crsp.dsf dataset:
res = execute(wrds, "select * from crsp.dsf
limit 10")
data = DataFrame(columntable(res))
res = 0
data
#6. To query specific variables from the crsp.dsf dataset:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi from crsp.dsf
limit 10")
data = DataFrame(columntable(res))
res = 0
data
#7. To query variables that meet certain criteria:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi
from crsp.dsf
where askhi > 2500
and bidlo < 2000
limit 10")
data = DataFrame(columntable(res))
res = 0
data
#8. To query by date:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi
from crsp.dsf
where date = '2013-01-04'")
data = DataFrame(columntable(res))
res = 0
data
#9. To query by date range:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi
from crsp.dsf
where date between '2013-01-07'
and '2013-01-08'")
data = DataFrame(columntable(res))
res = 0
data
#10. To query using multiple search criteria:
res = execute(wrds, "select cusip,permno,date,bidlo,askhi
from crsp.dsf
where date between '1960-01-01'
and '1980-01-01'
and askhi > 2500
and bidlo < 2000")
data = DataFrame(columntable(res))
res = 0
data
#11. To join and query two Compustat datasets:
res = execute(wrds, "select a.gvkey, a.datadate, a.tic,
a.conm, a.at, a.lt, b.prccm, b.cshoq
from comp.funda a join comp.secm b
on a.gvkey = b.gvkey
and a.datadate = b.datadate
where a.tic = 'IBM'
and a.datafmt = 'STD'
and a.consol = 'C'
and a.indfmt = 'INDL'")
data = DataFrame(columntable(res))
res = 0
data