Sorry if this is a bad questions but I use data sets from ICPSR (a social science repository) and to use an example a dataset that I recently used is about 200 mg in uncompressed csv however I’m able to get these down to about 30 mg in rds format in r and Stata dta format has a similar size. When I try to save it in jld using this code
save("dta.jld", "dta", dta, compress=true) it is still around 200 mg. I suspect that putting a lot of the variables which Julia imports as floats into int since most variables only contain a dozen or less unique values or something else might help but I have no idea how to do that on a larger scale (this example dataset has a little over 10,000 variables)
- What methods should I look into for this problem?
- What is the most user friendly way of doing this?
- What is the current state of data file formats in Julia, and future plans?
Although I’m familiar with R and Stata I’m no computer programmer but I want to learn Julia for the speed benefits while data wrangling. Getting more managable files sizes is one of those quality of life things I want to figure out.