When is it worth it to setup a database rather than to just organize the data into files?

I’m building a library of example spectra for chemical analysis and I’m wondering what the benefits would be to setting up a database in mySQL or something with these spectra rather than just having files in folders with descriptive names.

There will probably be thousands of spectra with various properties that need to be searchable and each spectra is maybe 1MB at most.

Would there be significant performance benefits to adding these things to a database rather than reading directories? Is there a cost associated with reading files compared to mySQL (or other database framework)?

1 Like

What aspects of the data are you interested in using as search criteria? Advice may differ depending on whether you only need to search metadata (e.g. what is the spectra for this particular material) vs if you need to search/analyze the data itself (e.g. which materials have this spectral feature).

As an intermediate step between file system organization and a full-on database, you could consider organizing your data into something like HDF5 (HDF5.jl).

1 Like

One thing that I look out for when thinking about whether or not I need a database is: Will I ever need to modify the data layout in some way (adding columns, changing some relationship)? If so, I always go for a database in the end, because their support for migrations are far superior to whatever I could homebrew with raw files.

3 Likes