Hey all -
I’m sort of new to principal components analysis - especially in Julia, but I’ve been digging through @nassarhuda’s great Dimensionality Reduction notebook but am running into an issue because the data I’m working with has a good number of missing values. Here’s what it looks like:
This is first an issue with the normalization the notebook recommends. It looks like I can’t do the normalization in the presence of missing values - and just get back a matrix full of missing values.
Obviously, this is not what I need. But I can drop out my missing data and do the same normalization process on what remains:
I then have no trouble running the PCA, but I am not sure how I can get this matched back on to my original dataframe, since it doesn’t have indices or the same dimensions as the original dataset. I also would love to be able to include observations with some missing data on some columns. Is that possible?
Any guidance would be greatly appreciated!