Hello all,
Complete Julia noob here. I am trying to figure out how to work with large datasets in Julia. I am trying to import a dataset that has missing values. It seems that Julia defaults to importing all these columns as strings. I only want to import some columns as strings. Currently I have:
temp=CSV.read(file,types=Dict(1=> String, 2=> String, 3=> Int64, 4=> Int64, 5=> Int64, 6=> Int64, 7=> Int64, 8=> Int64, 9=> Int64, 10=> Int64, 11=> Int64, 12=> Int64, 13=> Int64, 14=> Int64, 15=> Int64, 16=> Int64, 17=> Int64, 18=> Int64, 19=> Int64, 20=> Int64, 21=> Int64, 22=> Int64, 23=> Int64, 24=> Int64, 25=> Int64, 26=> Int64, 27=> Int64, 28=> Int64, 29=> Int64, 30=> Int64, 31=> Int64, 32=> Int64, 33=> Int64, 34=> Int64, 35=> Int64, 36=> Int64, 37=> Int64, 38=> Int64, 39=> Int64, 40=> Int64, 41=> Int64, 42=> Int64, 43=> Int64, 44=> Int64, 45=> Int64, 46=> Int64), silencewarnings=true);
It seems like there should be a way more efficient way to specify that columns 3 through 46 should be a mix of integer and missing? I couldn’t get any sort of looping to work (but I am new to Julia so it could be user error).
More generally, is there a good resource for working with large datasets in Julia? The problem I have encountered so far in my Julia experience is that all the examples I can find are only micro level examples (i.e. write a dictionary for a small dataset with 5 variables, which doesn’t translate to doing this with 46 variables).
Any guidance would be greatly appreciated!