Enforcing Schema on Data Frame Passed as Function Argument

lungben · October 26, 2020, 8:17am

I had the same issue:

My solution was to define a Dict with the required column names and types (abstract supertypes, where applicable) and a function to check if they are in the DataFrame (there may be more columns in the DataFrame, which is fine for me):


const INPUT_ELTYPES = Dict(
    :field_a=> AbstractString,
    :field_b => ItemTypes,
    :field_c=> AbstractString,
    :field_x=> Union{AbstractString, Nothing},
    :field_y=> Real,
    :value => Union{Real, Missing},
)

function check_input_data(df:: AbstractDataFrame)
    @assert COLUMN_NAMES ⊆ names(df)
    for (col_name, col_type) in INPUT_ELTYPES
        @assert eltype(df[!, col_name]) <: col_type
    end
end

Maybe something like this could be added to DataFrames.jl, or a typed data frame as alternative type. The latter could be just a thin wrapper over the standard DataFrame type (with untyped columns) to avoid recompilations, with the sole purpose of defining schemas.

Topic		Replies	Views
Can DataFrames be distinguished by type? General Usage question , multidispatch	1	317	August 7, 2022
Dispatch on DataFrame columns Data	6	782	June 4, 2020
Dataframe parses differently if data is passed in columns vs as an array General Usage dataframes	3	367	April 28, 2021
Problem assigning type to DataFrame New to Julia	6	290	March 29, 2021
Column types in DataFrames New to Julia dataframes	3	494	January 8, 2024

Enforcing Schema on Data Frame Passed as Function Argument

Related topics