[ANN] DataWrangler: Data transformation tools for analytics

viraltux · October 9, 2021, 10:30pm

DataWrangler.lj has been created in the process of decoupling functionality from Forecast.jl . The package contains a collection of basic tools to prepare data for analytics and specially so for time series analysis and regressions.

Next follows a list of the available tools to wrangle data:

Box-Cox and inverse Box-Cox transformation and estimation: boxcox , iboxcox
Data imputation (loess inter/extra-polation, random local density): impute , impute!
Data normalization (z-score, min-max, softmax, sigmoid): normalize , normalize!
Finite lagged difference and partial difference and its inverse: d , p
Outlier detection and removal: outlie , outlie!

Note: Although there are still a couple of packages to be finished before the decoupling is completed, this announcement will most likely be the last one before all the decoupled packages are fully integrated in Forecast.jl v.0.2.0. This integration might take a while but I believe the packages decoupled so far are useful in their own, hence the announcements.

viraltux · October 10, 2021, 10:18am

Hi @Storopoli , I went through your course at https://storopoli.io/Computacao-Cientifica/5_TimeSeries/
and it looks like it already has the basic chapters for an introduction to Time Series using Julia, what I might suggest is to add some information about how to prepare data before the chapter “Maneiras de Modelar Séries Temporais”.

Although Auto-Arima packages will differentiate and integrate automatically some preparation often needs to be done before that (removing outliers, deal with Heteroscedasticity, impute missing values… etc) and I think students will find it useful.

Great work with those Notebooks!

Storopoli · October 10, 2021, 1:14pm

Thanks I will revise this week the lecture and notebook. Since the virtual meeting and recording will be this friday.

I already starred the package, I found it awesome, specially the several normalize options. I was already using something myself (https://gist.github.com/storopoli/31ef4abdc542dc3a0e1cae2965ed8740) but having a package (and one with so small dependency weight) is truly great.

Topic		Replies	Views
[ANN] ForecastData: Time Series Datasets for Julia Package Announcements time-series	3	696	October 6, 2021
Package for deseasonalizing Statistics question	1	1144	February 23, 2018
Question about the data analysis General Usage question	0	269	August 5, 2020
TSAnalysis: time series analysis and state-space modelling Package Announcements statistics , time-series , machine-learning	49	10676	December 10, 2021
Time-series in Julia (working list) Statistics time-series	41	13770	August 14, 2025

[ANN] DataWrangler: Data transformation tools for analytics

Related topics