A type for metadata?

The way I do this currently is that I have one dataframe that contains the main quantitative data (samples are columns, features are rows). And then I have a separate dataframe that contains all metadata for each sample, where samples are rows and columns are different metadata. I often have to construct this table piecemeal, duplicating a lot of the subject-specific metadata by mapping samples -> subjects and then subjects -> subject metadata.

This sounds a lot like what databases are designed for. You have multiple tables with foreign keys.

  1. Subjects table: subject_id, subject_name, gender, diagnosis
  2. Samples table: sample_id, subject_id, collection_date, collection_location, sample_data

Then to calculate things you need to join these together

select subject_id, subject_name, gender, sample_data from Subjects 
LEFT JOIN Samples on Subject.subject_id = Samples.subject_id
where Samples.collection_date = 2017-12-13 and Subjects.diagnosis = 'hypertension'

My approach would be to normalize everything into relational tables and use a “real” database, or make custom structs for my problem and just write Julia code. The more I use RDBMSes the more I feel like whenever I have more than two dataframes in my code, I should switch to Postgres.