Engarde - Python defensive data analysis


I went to a Meet up o data engineering at https://www.quantumblack.com/
Engarde was mentioned:

Looks very useful. Is there anything similar in Julia?



I am puzzled why this was packaged up. It basically amounts to

@assert MyProject.is_valid_data(data)

where the real content is in is_valid_data, which one would define anyway on a per-dataset basis.

1 Like


The example given was when developing a data science pipeline. Things might change and this acts as a test every time a new pipeline is run. It just looked a useful feature to me.



I am not questioning the usefulness of validating data, just pointing out that

  1. the actual validation criteria is usually dependent on the dataset, and thus hard to generalize in the package,
  2. but it can be wrapped in a routine for a particular project, and called in a single line.

I don’t see what a package would and what it could look like. Most of the building blocks are already defined in Julia, eg issorted, or can be trivially implemented, eg !any(isnan, array).