Hi all,
Here’s a package I’ve started as part of a larger project that I am beating into shape. It’s not yet registered, so you’ll need to clone it from here.
I thought I’d put it out early because there has been much discussion about missing data, non-existent values (nothing) and invalid values, and this package provides one way to think about invalid values. More precisely, it requires that you specify valid values.
From the README:
A Schema
is a specification of a data set.
It exists independently of any particular data set, and therefore can be constructed and modified in the absence of a data set.
This package facilitates 3 use cases:
-
Read/write a schema from/to a yaml file. Thus schemata are portable, and a change to a schema does not require recompilation.
-
Compare a data set to a schema and list the non-compliance issues.
-
Transform an existing data set in order to comply with a schema as much as possible (then run the compare function on the result).
Feedback most welcome. Enjoy!
Jock