If you want smallest file size, HDF5 supports compression (what JLD format is based on), but historically not SPDP (and thus I’m not sure JLD[2] does yet), while now possible for 30% more compression (but 10x faster is possible):

https://userweb.cs.txstate.edu/~burtscher/research/SPDP/

SPDP is a fast, lossless, unified compression/decompression filter for HDF5 that has been designed for both 32-bit single-precision (float) and 64-bit double-precision (double) floating-point data. It also works on other data.

The paper on it is really interesting: https://userweb.cs.txstate.edu/~mb92/papers/dcc18.pdf

Abstract: Scientific computing produces, transfers, and stores massive amounts of single-and double-precision floating-point data, making this a domain that can greatly benefit from data compression. To gain insight into what makes an effective lossless compression algorithm for such data, we generated over nine million algorithms and selected the one that yields the highest compression ratio on 26 datasets.

[…]

We named the resulting algorithm SPDP, which is an abbreviation for “Single Precision Double Precision”. It is brand new […] Only Zstd performs better. On average, SPDP outperforms Blosc, bzip2, FastLZ, LZ4, LZO, and Snappy by at least 30% in terms of compression ratio. However, it tends to be slower.

We should consider supporting, using:

https://juliahub.com/ui/Packages/TurboPFor_jll/4zXB1/0.0.1+0

i.e. this package: https://github.com/powturbo/TurboPFor-Integer-Compression

and: https://github.com/powturbo/Turbo-Transpose [The first part is important as e.g. the current fastest supercomputer ARM-based and coming Macs too.]

**ALL** TurboTranspose functions now available under **64 bits ARMv8** including **NEON** SIMD. […]

- Dynamic CPU detection and
**JIT scalar/sse/avx2** switching
- 100% C (C++ headers), usage as simple as memcpy […]
- more efficient, up to
**10 times!** faster than Bitshuffle
- better compression (w/ lz77) and

**10 times!** faster than one of the best floating-point compressors SPDP
- can compress/decompress (w/ lz77) better and faster than other domain specific floating point compressors

[…]

eTp4Lzt = lossy compression with allowed error = 0.0001

See also: https://arxiv.org/pdf/1503.00638.pdf

a nearly lossless rounding step which compares the precision of the data to a generalized and calibration-independent form of the radiometer equation. This allows the precision of the data to be reduced in a way that has an insignificant impact on the data. The newly developed Bitshuffle lossless compression algorithm is subsequently applied

Also interesting: https://github.com/Ed-von-Schleck/shoco

for very small strings, it will always be better than standard compressors.