I’ve done plenty of dabbling in ML and written some models that work (albeit, not very well) but I’m finding it difficult to 1) navigate the many ML-related topics and decide which ones to spend more time learning and which ones to gloss over 2) understand how to fine-tune a model once it’s working.
I’m hoping there is enough charity here in the Julia community to help guide me in the right direction . The idea I have is to take some data from the Census Bureau’s American Community Survey (ACS) and explore different ways to predict a person’s income. There are millions of rows of data in the 2013 - 2017 ACS file that include tons of measurements about the individual survey respondents (age, educational attainment, occupation, the industry in which they work, race, gender, etc.).
It seems like there should be enough information in this dataset to make fairly accurate predictions. I’m thinking I should start out simply trying to predict whether or not a person earns above or below some threshold amount by building a logistic regression model, a random forest, a neural network, and maybe some other model that’s good for this kind of problem.
Does this sound like a good start? Is there any reason I should start with one type of algorithm over another? Any tips/advice you can give? Does this sound like a decent ‘beginner’ problem to solve, or is it too complex?
Lastly, if anyone is interested in learning these topics, please reach out as I’d love to collaborate and learn together.