I had the same issue:
My solution was to define a Dict with the required column names and types (abstract supertypes, where applicable) and a function to check if they are in the DataFrame (there may be more columns in the DataFrame, which is fine for me):
const INPUT_ELTYPES = Dict(
:field_a=> AbstractString,
:field_b => ItemTypes,
:field_c=> AbstractString,
:field_x=> Union{AbstractString, Nothing},
:field_y=> Real,
:value => Union{Real, Missing},
)
function check_input_data(df:: AbstractDataFrame)
@assert COLUMN_NAMES ⊆ names(df)
for (col_name, col_type) in INPUT_ELTYPES
@assert eltype(df[!, col_name]) <: col_type
end
end
Maybe something like this could be added to DataFrames.jl, or a typed data frame as alternative type. The latter could be just a thin wrapper over the standard DataFrame type (with untyped columns) to avoid recompilations, with the sole purpose of defining schemas.