[ANN] DataWrangler: Data transformation tools for analytics

DataWrangler.lj has been created in the process of decoupling functionality from Forecast.jl . The package contains a collection of basic tools to prepare data for analytics and specially so for time series analysis and regressions.

Next follows a list of the available tools to wrangle data:

  • Box-Cox and inverse Box-Cox transformation and estimation: boxcox , iboxcox
  • Data imputation (loess inter/extra-polation, random local density): impute , impute!
  • Data normalization (z-score, min-max, softmax, sigmoid): normalize , normalize!
  • Finite lagged difference and partial difference and its inverse: d , p
  • Outlier detection and removal: outlie , outlie!


Note: Although there are still a couple of packages to be finished before the decoupling is completed, this announcement will most likely be the last one before all the decoupled packages are fully integrated in Forecast.jl v.0.2.0. This integration might take a while but I believe the packages decoupled so far are useful in their own, hence the announcements.


Hi @Storopoli , I went through your course at CiĂȘncia de Dados e Computação CientĂ­fica com Julia
and it looks like it already has the basic chapters for an introduction to Time Series using Julia, what I might suggest is to add some information about how to prepare data before the chapter “Maneiras de Modelar SĂ©ries Temporais”.

Although Auto-Arima packages will differentiate and integrate automatically some preparation often needs to be done before that (removing outliers, deal with Heteroscedasticity, impute missing values
 etc) and I think students will find it useful.

Great work with those Notebooks!


Thanks I will revise this week the lecture and notebook. Since the virtual meeting and recording will be this friday.

I already starred the package, I found it awesome, specially the several normalize options. I was already using something myself (Min-Max and Range Scalers · GitHub) but having a package (and one with so small dependency weight) is truly great.