A type for metadata?

jpfairbanks · August 17, 2018, 6:24pm

The way I do this currently is that I have one dataframe that contains the main quantitative data (samples are columns, features are rows). And then I have a separate dataframe that contains all metadata for each sample, where samples are rows and columns are different metadata. I often have to construct this table piecemeal, duplicating a lot of the subject-specific metadata by mapping samples -> subjects and then subjects -> subject metadata.

This sounds a lot like what databases are designed for. You have multiple tables with foreign keys.

Subjects table: subject_id, subject_name, gender, diagnosis
Samples table: sample_id, subject_id, collection_date, collection_location, sample_data

Then to calculate things you need to join these together

select subject_id, subject_name, gender, sample_data from Subjects 
LEFT JOIN Samples on Subject.subject_id = Samples.subject_id
where Samples.collection_date = 2017-12-13 and Subjects.diagnosis = 'hypertension'

My approach would be to normalize everything into relational tables and use a “real” database, or make custom structs for my problem and just write Julia code. The more I use RDBMSes the more I feel like whenever I have more than two dataframes in my code, I should switch to Postgres.

Topic		Replies	Views
How to add metadata info to a DataFrame? Data dataframes , metadata	96	9052	October 26, 2019
DataFrames.jl: metadata Data package , dataframes , metadata	149	7059	April 24, 2023
Attaching simple metadata to types General Usage	1	354	October 21, 2021
Simulation Framework with Logging to Tabular Data General Usage question	25	2975	June 19, 2018
Genomic Interval Metadata & Annotation Biology, Health, and Medicine question	9	1812	July 21, 2017

A type for metadata?

Related topics