How do you like to save your data and how much customizability do you like/need?

I’m making an application that controls a spectrometer (among other things). The program is likely to be saving dozens if not hundreds of spectra, each being about 0.5MB.
There’s also some positional metadata that comes with each measurement.

So I ask: How would you as a researcher prefer to receive this data? With a prefix/suffix or customizable counter in the file name? How should the metadata be included?

Is there a standard file format I should be using like NetCDF?

How would you like the file saving settings to look like / function?

Each measurement is also related to other measurements in runs so should I actually be saving the individual spectra into one big file to make it easier to display and read the bulk data of a run?

I’m guessing I should also include the option to save the data as .csv afterwards but for speed reasons it’s not really practical to save the data as csv in real time.

I generally like to have data and metadata separated into different files, but not all metadata is the same, I think that metadata that’s needed to correctly interpret the data should be together with the data in the same file possibly. Exactly what depends a bit on the situation and use case though.

The post cited below was written a few years ago, and in the meanwhile there are also other options, but these listed are presumably still valid. Among the listed formats, HDF5 has the advantage of being supported by many specialized softwares (e.g. I know it is supported by Origin and Igor Pro).

1 Like

Great! I think I’ve settled on HDF5 as the medium since it seems to be supported by most things (even Excel to some extent) and supports basically arbitrary metadata.

Now onto the file names. How much customisation do y’all like to have when choosing auto-generated file names? How do other programs handle that type of thing?

I am by no means an expert, but I would choose between Arrow and HDF5. Probably worth trying both. Each can store metadata. The one place where I liked Arrow is that if you are collecting an unknown amount of data, the Arrow file can keep on growing, you don’t have to set the file size up front. The downside with this is that the metadata for each channel gets clobbered, though the metadata for the file stays intact. So the trick is to include the channel metadata with the file metadata. I have less experience with HDF5

I like to use clean_file_name() ( PetrKryslUCSD/DataDrop.jl: Numbers and matrices and strings stored to disk and retrieved again.): Windows is quite touchy about file names, and I like to store a lot of numerical & other info in the file name.

1 Like