Data Import Types and Compression

ldsands · December 22, 2017, 1:14am

Hello all,

Sorry if this is a bad questions but I use data sets from ICPSR (a social science repository) and to use an example a dataset that I recently used is about 200 mg in uncompressed csv however I’m able to get these down to about 30 mg in rds format in r and Stata dta format has a similar size. When I try to save it in jld using this code save("dta.jld", "dta", dta, compress=true) it is still around 200 mg. I suspect that putting a lot of the variables which Julia imports as floats into int since most variables only contain a dozen or less unique values or something else might help but I have no idea how to do that on a larger scale (this example dataset has a little over 10,000 variables)

What methods should I look into for this problem?
What is the most user friendly way of doing this?
What is the current state of data file formats in Julia, and future plans?

Although I’m familiar with R and Stata I’m no computer programmer but I want to learn Julia for the speed benefits while data wrangling. Getting more managable files sizes is one of those quality of life things I want to figure out.

Thank you!

Yifan_Liu · December 22, 2017, 4:36am

Try the fst format for R and Julia

https://github.com/xiaodaigh/fstformat.jl

Topic		Replies	Views
Saving julia dataframes efficiently (in terms of size on the disk) General Usage	13	1510	January 19, 2020
How to save an array to disk in compressed form? General Usage question , data-compression	9	2859	January 24, 2023
[ANN] JDF.jl - Experimental Julia DataFrames serialization format Package Announcements	3	1428	January 19, 2020
How to read a compressed CSV file? New to Julia	11	4890	January 17, 2019
Julia save ".rdata" equivalent New to Julia	6	984	June 14, 2020

Data Import Types and Compression

Related topics