Is there a way to stream a table with 1 billion rows from SQL Server to Parquet efficently in Julia

dfdx · November 16, 2017, 9:18pm

A couple of days ago I described how to read data from PostgreSQL using Spark.jl. If you manage to do the same thing with SQL Server, saving the result to Parquet is a matter of a single call to write_parquet(). At the moment documentation of Spark.jl is very limited, so don’t hesitate to ask questions.

Alternatively, you can query SQL server with whatever tools you like (e.g. JDBC.jl or ODBC.jl) and write it using Parquet.jl, although I’m don’t their current state and suitability.

Topic		Replies	Views
Repartitioning 2TB of csv into parquets Data big-data	21	4669	June 25, 2020
Write Large Parquet to S3 General Usage parquet	6	309	August 9, 2023
Writing Parquet files General Usage	28	5248	November 12, 2020
Struggling with Julia and large datasets General Usage question , big-data	67	11070	October 17, 2024
Converting CSV to Parquet in Julia New to Julia question , csv , parquet	22	1574	August 11, 2024

Is there a way to stream a table with 1 billion rows from SQL Server to Parquet efficently in Julia

Related topics