Efficient Disk Usage and JLD

gideonsimpson · April 18, 2018, 12:38am

Suppose I generate the following two files using JLD

X1=randn(5,10^4);
X2 = [randn(5) for i=1:10^4];
@save "X1.jld" X1;
@save "X2.jld" X2;

Both data structures have the same amount of information, but X1.jld takes up 404 KB, while X2.jld takes up 3.6 MB. Is there a way to get data formatted X2 to behave more efficiently or do I just need to rewrite my code?

y4lu · April 18, 2018, 1:32am

It looks like each array probably has a few hundred bytes of overhead for storage in JLD, which is a bit high for 5x8 bytes of data per array
X3 = [randn(10^4) for i = 1:5] should be pretty close to the size of X1.jld

gideonsimpson · April 18, 2018, 1:44am

Yes, I agree, that’s much better, but it does not conform to my problem design. I am imagining problems where I have (\mathbf{x}_n)_{n=1}^N, with each \mathbf{x}_n \in \mathbb{R}^d, and, typically, d\ll N, but d could still be relatively large, say 100. The problem structure is that of a time series of vector valued quantities. I could rewrite teh whole thing in terms of a 2D array, but I was really hoping to avoid that because it lets me leave \mathbf{x}_n a bit more abstract.

gideonsimpson · April 18, 2018, 1:58am

I’ll add that while it’s still larger than I like, JLD2 does appear to be more efficiently (by about a factor of 3) than JLD.

y4lu · April 18, 2018, 3:24am

It might be worth switching the format on load/save for similar ‘many, short fixed length arrays’ cases

function v2transpose(v)
  n = size(v,1);
  m = size(v[1],1)
  v2 = [zeros(n) for i=1:m]
  [v2[i][j] = v[j][i] for i=1:m, j=1:n]
  return v2;
  end;

kristoffer.carlsson · April 18, 2018, 6:06am

You can give custom serializer and deserializers for types to JLD. Pack your data more efficiently in the serializer and pack it back up with the deserializer.

Topic		Replies	Views
Serialization compresses very well? General Usage	0	401	August 17, 2018
How to optimaly save in JLD or HDF5 many Any arrays General Usage hdf5	0	659	January 18, 2017
How to optymaly to save strings in JLD ? What wrong? General Usage	7	961	September 24, 2017
JLD2 seems slow at write operations compared to serialize and HDF5 General Usage data	3	1187	November 20, 2017
How do you save data in Monte Carlo simulations? Data question , data	8	2317	August 16, 2017

Efficient Disk Usage and JLD

Related topics