Is there a way to stream a table with 1 billion rows from SQL Server to Parquet efficently in Julia

A couple of days ago I described how to read data from PostgreSQL using Spark.jl. If you manage to do the same thing with SQL Server, saving the result to Parquet is a matter of a single call to write_parquet(). At the moment documentation of Spark.jl is very limited, so don’t hesitate to ask questions.

Alternatively, you can query SQL server with whatever tools you like (e.g. JDBC.jl or ODBC.jl) and write it using Parquet.jl, although I’m don’t their current state and suitability.

1 Like