How do you like to save your data and how much customizability do you like/need?

AwesomeQuest · June 6, 2025, 4:56pm

I’m making an application that controls a spectrometer (among other things). The program is likely to be saving dozens if not hundreds of spectra, each being about 0.5MB.
There’s also some positional metadata that comes with each measurement.

So I ask: How would you as a researcher prefer to receive this data? With a prefix/suffix or customizable counter in the file name? How should the metadata be included?

Is there a standard file format I should be using like NetCDF?

How would you like the file saving settings to look like / function?

Each measurement is also related to other measurements in runs so should I actually be saving the individual spectra into one big file to make it easier to display and read the bulk data of a run?

I’m guessing I should also include the option to save the data as .csv afterwards but for speed reasons it’s not really practical to save the data as csv in real time.

cshen · June 6, 2025, 6:08pm

I generally like to have data and metadata separated into different files, but not all metadata is the same, I think that metadata that’s needed to correctly interpret the data should be together with the data in the same file possibly. Exactly what depends a bit on the situation and use case though.

Eben60 · June 6, 2025, 6:20pm

The post cited below was written a few years ago, and in the meanwhile there are also other options, but these listed are presumably still valid. Among the listed formats, HDF5 has the advantage of being supported by many specialized softwares (e.g. I know it is supported by Origin and Igor Pro).

AwesomeQuest · June 7, 2025, 10:51am

Great! I think I’ve settled on HDF5 as the medium since it seems to be supported by most things (even Excel to some extent) and supports basically arbitrary metadata.

Now onto the file names. How much customisation do y’all like to have when choosing auto-generated file names? How do other programs handle that type of thing?

Jake · June 8, 2025, 2:35am

I am by no means an expert, but I would choose between Arrow and HDF5. Probably worth trying both. Each can store metadata. The one place where I liked Arrow is that if you are collecting an unknown amount of data, the Arrow file can keep on growing, you don’t have to set the file size up front. The downside with this is that the metadata for each channel gets clobbered, though the metadata for the file stays intact. So the trick is to include the channel metadata with the file metadata. I have less experience with HDF5

PetrKryslUCSD · June 8, 2025, 2:39pm

I like to use clean_file_name() ( PetrKryslUCSD/DataDrop.jl: Numbers and matrices and strings stored to disk and retrieved again.): Windows is quite touchy about file names, and I like to store a lot of numerical & other info in the file name.

Topic		Replies	Views
Suggested formats for saving and serialization Data package , data	8	1533	April 17, 2017
How to handle and store large amounts of (distributed) generated data? Data	4	1447	April 15, 2017
Best way to store arrays with metadata? New to Julia dataframes	1	388	October 10, 2023
How do you store your data before and after processing with Julia? Data	35	7176	March 1, 2021
Writing measurement files New to Julia question	1	331	March 30, 2022

How do you like to save your data and how much customizability do you like/need?

Related topics