Engarde - Python defensive data analysis

johnh · March 1, 2019, 7:59am

I went to a Meet up o data engineering at https://www.quantumblack.com/
Engarde was mentioned:
Engarde! — engarde 0.4.0+9.ge7ea040 documentation

Looks very useful. Is there anything similar in Julia?

Tamas_Papp · March 1, 2019, 8:08am

I am puzzled why this was packaged up. It basically amounts to

@assert MyProject.is_valid_data(data)

where the real content is in is_valid_data, which one would define anyway on a per-dataset basis.

johnh · March 1, 2019, 10:09am

The example given was when developing a data science pipeline. Things might change and this acts as a test every time a new pipeline is run. It just looked a useful feature to me.

Tamas_Papp · March 1, 2019, 10:42am

I am not questioning the usefulness of validating data, just pointing out that

the actual validation criteria is usually dependent on the dataset, and thus hard to generalize in the package,
but it can be wrapped in a routine for a particular project, and called in a single line.

I don’t see what a package would and what it could look like. Most of the building blocks are already defined in Julia, eg issorted, or can be trivially implemented, eg !any(isnan, array).

Topic		Replies	Views
Is Julia a good choice for Data Engineering? General Usage question	11	5413	January 29, 2022
Recent experience with Julia as the main data science driver General Usage	18	3615	August 8, 2021
Julia losing popularity among Data Science users (KDnuggets Software Poll) Community	146	19838	June 23, 2018
What's the current (spring 2024) canonical approach to data science in Julia? General Usage dataframes	34	4168	April 8, 2024
Things that are easier in Julia than Python/R etc Community python , r	60	6999	October 17, 2021

Engarde - Python defensive data analysis

Related topics