Enforcing Schema on Data Frame Passed as Function Argument

I had the same issue:

My solution was to define a Dict with the required column names and types (abstract supertypes, where applicable) and a function to check if they are in the DataFrame (there may be more columns in the DataFrame, which is fine for me):


const INPUT_ELTYPES = Dict(
    :field_a=> AbstractString,
    :field_b => ItemTypes,
    :field_c=> AbstractString,
    :field_x=> Union{AbstractString, Nothing},
    :field_y=> Real,
    :value => Union{Real, Missing},
)

function check_input_data(df:: AbstractDataFrame)
    @assert COLUMN_NAMES ⊆ names(df)
    for (col_name, col_type) in INPUT_ELTYPES
        @assert eltype(df[!, col_name]) <: col_type
    end
end

Maybe something like this could be added to DataFrames.jl, or a typed data frame as alternative type. The latter could be just a thin wrapper over the standard DataFrame type (with untyped columns) to avoid recompilations, with the sole purpose of defining schemas.