I have a side job/project that involves analyzing a large data set. I am leveraging this project to gain competence with programming generally and Julia specifically, meaning the learning curve is steep as I work through this material.
My question has to do with reading in data and changing values in a dataframe. I have a column in my data labeled ‘experience’ which lists the number of years each person has been involved in a particular profession. Some of these values are numbers (such as 2.5, 0.5, etc.), though many of them are completely incosistent (eg: 6 months, 2 years 3 months, 1 1/2 years, etc.). I don’t really know where to begin in reading this all in in the same format.
I guess I’m looking for specific guidance or, alternatively, a book or resource on data management in Julia.
Thank you for taking the time.
Likely the column would be read in as the type String
. Parsing it/breaking it down will not really be easy if there isn’t any uniformity to the values in the column. It is not so much of a Julia question as it is a general programming question. These things really only work well when the data is uniform. If all of the values looked like “2 years 3 months” or if all of the values looked like “1 1/2 years” then you would do this easily. But both of them from the same column will be difficult.
My suggestion would probably be to figure out what units you want the column to be in (years? months?), parse the ones that you can, and identify the problematic rows. If they are a manageable fraction, fix them by hand in Excel.
Thank you very much. I suspected it was a bigger picture problem, I appreciate your input. I ended up fixing it manually and it didn’t take too long.