[ANN] WRDSMerger.jl - Linking Financial Research Identifiers Across WRDS Datasets

WRDSMerger.jl

WRDS (Wharton Research Data Services) is one of the most widely used platforms for financial and economic research data, providing access to databases like CRSP (stock market data), Compustat (firm financials), IBES (analyst forecasts), and OptionMetrics. WRDSMerger.jl is a Julia package that simplifies common workflows with these datasets. Its most useful feature is a unified system for linking identifiers across them.

The Problem: Scattered Identifiers

Anyone who has worked with WRDS data knows the pain of linking datasets. CRSP uses Permnos, Compustat uses GVKeys, the SEC uses CIKs, and various datasets use different flavors of CUSIPs and Tickers. The mapping tables between these identifiers are scattered across different WRDS schemas, each with their own date ranges, priority rules, and quirks. Some links are well-documented (Compustat–CRSP), but others require chaining through intermediate identifiers. For example, linking IBES analyst data to Compustat requires going through CRSP as an intermediary, and the mapping tables don’t make that obvious.

The Solution: One Consistent API

WRDSMerger.jl provides a simple pattern for converting any identifier to any other:

TargetID(SourceID(value), date)

For example:

using WRDSMerger, Dates

GVKey(Permno(47896), Date(2020))        # CRSP stock -> Compustat firm
NCusip(CIK(19617), Date(2020))          # SEC firm -> historical CUSIP
CIK(Permno(47896), Date(2020))          # CRSP stock -> SEC firm (indirect!)
CIK(NCusip("46625H21"), Date(2020))     # historical CUSIP -> SEC firm (also indirect)

The last two examples are notable: there is no WRDS table that directly maps CIK to Permno or CIK to NCusip. The package automatically finds the shortest path through intermediate identifiers (Permno → GVKey → CIK, in this case) and chains the conversions. You don’t need to know or care which tables are involved.

This works naturally with broadcasting and packages like DataFramesMeta.jl:

# Convert a column of Permnos to GVKeys
@rtransform(df, :gvkey = GVKey(Permno(:permno), :date))

# Or with plain broadcasting
GVKey.(Permno.([47896, 44206, 46703]), Date(2020))

Smart Linking

The package leverages Julia’s type system to handle some tricky edge cases automatically:

  • Parent-firm fallback: Identifiers are categorized as either firm identifiers (e.g., GVKey, Permco) or security identifiers (e.g., Permno, NCusip). When converting a security identifier to a firm identifier, if the exact security isn’t found, the package will try through the parent firm. For example, if an 8-character CUSIP isn’t in the link tables, but the corresponding 6-character firm CUSIP is, the conversion to a firm identifier still succeeds. This is especially useful when CUSIP data quality is imperfect.

  • Inexact date matching: If an identifier mapping has only one possible target but the date falls outside the recorded validity range, the package returns the match anyway (since it’s unambiguous). This can be disabled with allow_inexact_date=false.

  • Path optimization: When multiple equally short paths exist between two identifiers, the package prefers routes through Permno, since it’s the best-connected hub in the WRDS identifier network.

Beyond Linking

The package also provides convenience functions for downloading data from CRSP and Compustat, Fama-French factors, database exploration tools (list_libraries, list_tables, describe_table), and a range_join function for inequality-based joins on date ranges. See the documentation for details.

What’s New in v0.7

  • CRSP V2 support: New generate_crsp_links_v2 function for the updated CRSP V2 tables
  • Parameterized CUSIP types: Cusip{:historical} and Cusip{:current} (with aliases NCusip and HdrCusip) replace the old unparameterized Cusip, making the historical vs. current distinction explicit
  • Faster loading: Added precompilation workloads for significant startup performance improvements (~50s to <20s in one project)
  • Documentation improvements: New pages on identifier types, default behavior, and DuckDB workflows

Getting Started

]add WRDSMerger

Note: This package requires a WRDS subscription to access data (most academic institutions provide one). It also supports working with locally saved data via DuckDB.

Disclaimer: This package has no affiliation with WRDS or Wharton. While it is tested, linking financial identifiers is inherently messy and results should be verified, especially for research that depends on accurate matches.

2 Likes