Why are MAT files created by Julia significantly larger than those created by Matlab?

This is how I created my MAT file

using MAT
F1 = matopen("test.mat", "w");
write(F1, "x",  B[:,1]);
write(F1, "y",  B[:,2]);
write(F1, "f",  B[:,3]);
close(F1)

It is 898MB. If I open it within Matlab and then
save('test2.mat', 'x', 'y', 'f')
My new file storing the same variables is only 127MB.

I tried to regenerate the file from scratch inside Matlab, and again the file size is only 127MB.

Why?

What did I do wrong?

Thanks!

MATLAB has some kind of default compression?

1 Like

The MAT.jl repo is where this question is best asked, but I think they now use some form of hdf5 so compression is likely the reason.

1 Like

Yep: MAT-File Versions - MATLAB & Simulink

2 Likes

There are some details about the compression used here:

1 Like

I think MAT.jl doesn’t use compression by default. There’s a kwarg you need to add: compress = true
I looked into the test folder: https://github.com/JuliaIO/MAT.jl/blob/3ed629c05f7261e86c0dde0869d265e99a265efb/test/write.jl#L22
Example: matwrite(joinpath(new_dir, fname), data; compress=true)

5 Likes

Amusing point: the savefast code the author links is mine back from my pre-Julia days. It now feels slightly weird seeing code from those days.

14 Likes

Thank you all for the very helpful discussion!

Tim,
Interesting to hear about that.

Concerning the best way to write the code, I assume you meant the below?
matwrite(F1, Dict(
“x” => B[:,1],
“y” => B[:,2],
“f” => B[:,3]
); compress = true);

My question is why they use a semicolon, instead of a comma after the dictionary variable? Both seem to work though.

Semicolon can be used to delimit positional arguments from keyword ditto. It’s not necessary here, but both bluestyle and yas recommend it. And there is this feature, where it does matter:

a = true
f(x; a) == f(x, a=true) != f(x, a)
2 Likes

You made MATLAB fast in your pre-Julia days, then you made the same in Julia, just slower… :dizzy_face: That’s unusual for you.

[I’m really not complaining. your MAT.jl really helped me, port MATLAB code to Julia at my job. It was my first assignment, and I had used MATLAB (or Octave actually) a bit before but never professionally nor Julia.]

It’s probably easy to enable compression for MAT.jl? [Not pushing for it, I don’t need .mat files, they just helped be temporarily while I needed to run both Julia and MATLAB while porting, until I completely got rid of all MATLAB code.]

1 Like

it’s already there

1 Like

IMO, compress=true should be the default. I really do not see the benefit of an uncompressed MAT file. It just takes so much of my disk space.

Kudos to Tim for creating the MAT.jl package! Many thanks :+1:

I didn’t really write MAT.jl, because by the time I helped get HDF5 and JLD off the ground I was pretty solidly in the Julia camp. So why would I need to save *.mat files anymore? Making MAT better is now up to people who need it.

7 Likes

By now, I’m pretty sure the reason MAT files created by Julia are so large is because MAT.jl stores info in the format of a Dictionary, which takes up an order of magnitude more disk space than Arrays or Tuples.

I wonder if they could upgrade the package so that they will use named tuples to store the variables instead. Named tuples takes up as little space as a regular tuple, and is the most efficient of storing variables.

I think HDF5 files (which is what .mat file really is) are compressed by default in Matlab, but not in Julia. Enable compression when writing the file.

1 Like

Thanks! I’ve been compressing them, but it is still slightly larger than files created by Matlab.