Why are MAT files created by Julia significantly larger than those created by Matlab?

leon · October 15, 2021, 1:14am

This is how I created my MAT file

using MAT
F1 = matopen("test.mat", "w");
write(F1, "x",  B[:,1]);
write(F1, "y",  B[:,2]);
write(F1, "f",  B[:,3]);
close(F1)

It is 898MB. If I open it within Matlab and then
save('test2.mat', 'x', 'y', 'f')
My new file storing the same variables is only 127MB.

I tried to regenerate the file from scratch inside Matlab, and again the file size is only 127MB.

Why?

What did I do wrong?

Thanks!

jling · October 15, 2021, 1:17am

MATLAB has some kind of default compression?

Zach_Christensen · October 15, 2021, 1:45am

The MAT.jl repo is where this question is best asked, but I think they now use some form of hdf5 so compression is likely the reason.

stevengj · October 15, 2021, 2:30am

Yep: MAT-File Versions - MATLAB & Simulink

mkitti · October 15, 2021, 2:53am

There are some details about the compression used here:

Iulian.Cioarca · October 15, 2021, 6:44am

I think MAT.jl doesn’t use compression by default. There’s a kwarg you need to add: compress = true
I looked into the test folder: https://github.com/JuliaIO/MAT.jl/blob/3ed629c05f7261e86c0dde0869d265e99a265efb/test/write.jl#L22
Example: matwrite(joinpath(new_dir, fname), data; compress=true)

tim.holy · October 15, 2021, 7:44am

Amusing point: the savefast code the author links is mine back from my pre-Julia days. It now feels slightly weird seeing code from those days.

leon · October 15, 2021, 9:08am

Thank you all for the very helpful discussion!

Tim,
Interesting to hear about that.

Concerning the best way to write the code, I assume you meant the below?
matwrite(F1, Dict(
“x” => B[:,1],
“y” => B[:,2],
“f” => B[:,3]
); compress = true);

My question is why they use a semicolon, instead of a comma after the dictionary variable? Both seem to work though.

gustaphe · October 15, 2021, 9:19am

Semicolon can be used to delimit positional arguments from keyword ditto. It’s not necessary here, but both bluestyle and yas recommend it. And there is this feature, where it does matter:

a = true
f(x; a) == f(x, a=true) != f(x, a)

Palli · October 15, 2021, 11:25am

You made MATLAB fast in your pre-Julia days, then you made the same in Julia, just slower… That’s unusual for you.

[I’m really not complaining. your MAT.jl really helped me, port MATLAB code to Julia at my job. It was my first assignment, and I had used MATLAB (or Octave actually) a bit before but never professionally nor Julia.]

It’s probably easy to enable compression for MAT.jl? [Not pushing for it, I don’t need .mat files, they just helped be temporarily while I needed to run both Julia and MATLAB while porting, until I completely got rid of all MATLAB code.]

jling · October 15, 2021, 11:56am

github.com

JuliaIO/MAT.jl/blob/3ed629c05f7261e86c0dde0869d265e99a265efb/src/MAT.jl#L142


      
              try
                  vars = read(file)
              finally
                  close(file)
              end
              vars
          end
          
          # Write a dict to a MATLAB file
          """
              matwrite(filename, d::Dict; compress::Bool = false)
          
          Write a dictionary containing variable names as keys and values as values
          to a Matlab file, opening and closing it automatically.
          """
          function matwrite(filename::AbstractString, dict::AbstractDict{S, T}; compress::Bool = false) where {S, T}
              file = matopen(filename, "w"; compress = compress)
              try
                  for (k, v) in dict
                      local kstring
                      try

it’s already there

leon · October 15, 2021, 12:10pm

IMO, compress=true should be the default. I really do not see the benefit of an uncompressed MAT file. It just takes so much of my disk space.

Kudos to Tim for creating the MAT.jl package! Many thanks

tim.holy · October 15, 2021, 1:09pm

I didn’t really write MAT.jl, because by the time I helped get HDF5 and JLD off the ground I was pretty solidly in the Julia camp. So why would I need to save *.mat files anymore? Making MAT better is now up to people who need it.

leon · December 12, 2021, 2:31pm

By now, I’m pretty sure the reason MAT files created by Julia are so large is because MAT.jl stores info in the format of a Dictionary, which takes up an order of magnitude more disk space than Arrays or Tuples.

I wonder if they could upgrade the package so that they will use named tuples to store the variables instead. Named tuples takes up as little space as a regular tuple, and is the most efficient of storing variables.

PetrKryslUCSD · December 12, 2021, 4:28pm

I think HDF5 files (which is what .mat file really is) are compressed by default in Matlab, but not in Julia. Enable compression when writing the file.

leon · December 13, 2021, 3:03pm

Thanks! I’ve been compressing them, but it is still slightly larger than files created by Matlab.

Topic		Replies	Views
How to store Julia variables and matrices like mat files in MATLAB General Usage question , matlab	4	2292	October 11, 2021
Use MAT.jl to save a .mat file containing a MATLAB struct array General Usage question , matlab	1	2909	June 23, 2021
Data Storage Quo Vadis under Julia: HDF5 - JLD2 - MAT - Performance big-data , data-compression	0	661	April 27, 2022
Storing data in Julia New to Julia	4	1704	March 14, 2019
Converting MATLAB ".mat" matrices to julia matrices New to Julia	7	991	September 15, 2022

Why are MAT files created by Julia significantly larger than those created by Matlab?

Related topics