[ANN] DataWrangler: Data transformation tools for analytics

DataWrangler.lj has been created in the process of decoupling functionality from Forecast.jl . The package contains a collection of basic tools to prepare data for analytics and specially so for time series analysis and regressions.

Next follows a list of the available tools to wrangle data:

  • Box-Cox and inverse Box-Cox transformation and estimation: boxcox , iboxcox
  • Data imputation (loess inter/extra-polation, random local density): impute , impute!
  • Data normalization (z-score, min-max, softmax, sigmoid): normalize , normalize!
  • Finite lagged difference and partial difference and its inverse: d , p
  • Outlier detection and removal: outlie , outlie!

image
image

Note: Although there are still a couple of packages to be finished before the decoupling is completed, this announcement will most likely be the last one before all the decoupled packages are fully integrated in Forecast.jl v.0.2.0. This integration might take a while but I believe the packages decoupled so far are useful in their own, hence the announcements.

12 Likes

Hi @Storopoli , I went through your course at https://storopoli.io/Computacao-Cientifica/5_TimeSeries/
and it looks like it already has the basic chapters for an introduction to Time Series using Julia, what I might suggest is to add some information about how to prepare data before the chapter “Maneiras de Modelar Séries Temporais”.

Although Auto-Arima packages will differentiate and integrate automatically some preparation often needs to be done before that (removing outliers, deal with Heteroscedasticity, impute missing values… etc) and I think students will find it useful.

Great work with those Notebooks!

4 Likes

Thanks I will revise this week the lecture and notebook. Since the virtual meeting and recording will be this friday.

I already starred the package, I found it awesome, specially the several normalize options. I was already using something myself (https://gist.github.com/storopoli/31ef4abdc542dc3a0e1cae2965ed8740) but having a package (and one with so small dependency weight) is truly great.

3 Likes