Planing on building an ETL Framework

th0rben · January 13, 2025, 11:01pm

I am planing to build a package for ETL processes as my Bachelor thesis.
My plan is to to have the package be a simple way to build data pipelines following ETL. Having a simple way to load the data from different sources, transforming the data and then loading it back into storage or use it elsewhere.

I am still trying to figure out what i should include into the package. So far i am thinking about the following:

connectivity to multiple data-sources
data transformation with custom pipelines
- might add support for data cleaning
- might add aggregations, filters, joins, splits
multi threading/parallel computing
logging and monitoring
multi source and target

It would be helpfull to get some insights from you guys.
Please tell me if i am missing anything important.

p-gw · January 14, 2025, 1:01pm

I tried something similar a while ago when I needed an automation solution for work. I started to build very basic custom pipelines reading data from databases and parquet files (DuckDB.jl), cleaning and aggregating data (DataFrames.jl) and saving the results to parquet files.

I can’t find the threads now, but I remember there were some initial tries to program a framework like this. However, I think they ended up abandoned.

While I found julia quite suitable for the task, in my project I discovered myself basically reinventing dlt and dbt, so I just ended up using these existing solutions instead.

era127 · January 14, 2025, 7:56pm

The advantage of Julia would be to use the repl to write custom code in Julia and then register it as a UDF to run it inside of Duckdb. This would leverage the c function performance to run models or optimizations within the stream processing.

Topic		Replies	Views
Is there any ETL GUI for Julia? General Usage question	6	2916	October 15, 2020
Is Julia a good choice for Data Engineering? General Usage question	11	5426	January 29, 2022
ETL and data analytics tools with Julia General Usage question	0	1187	September 7, 2017
JuliaDB, dataframes: Speculations over the future of data packages Data	24	7438	August 21, 2020
A serious data start-up structured around a Julia data manipulation framework for larger-than-RAM data Offtopic	24	859	September 16, 2024

Planing on building an ETL Framework

Related topics