Both data structures have the same amount of information, but X1.jld takes up 404 KB, while X2.jld takes up 3.6 MB. Is there a way to get data formatted X2 to behave more efficiently or do I just need to rewrite my code?
It looks like each array probably has a few hundred bytes of overhead for storage in JLD, which is a bit high for 5x8 bytes of data per array X3 = [randn(10^4) for i = 1:5] should be pretty close to the size of X1.jld
Yes, I agree, that’s much better, but it does not conform to my problem design. I am imagining problems where I have (\mathbf{x}_n)_{n=1}^N, with each \mathbf{x}_n \in \mathbb{R}^d, and, typically, d\ll N, but d could still be relatively large, say 100. The problem structure is that of a time series of vector valued quantities. I could rewrite teh whole thing in terms of a 2D array, but I was really hoping to avoid that because it lets me leave \mathbf{x}_n a bit more abstract.
You can give custom serializer and deserializers for types to JLD. Pack your data more efficiently in the serializer and pack it back up with the deserializer.