Best Practices for Using "Off-the-Shelf" ML Solutions?

TheCedarPrince · September 9, 2022, 2:19pm

Hi folks!

I am not a formal ML researcher (I couldn’t tell you how to build a Neural Network from scratch) but I know about ML models and when what species of models make sense for what problems.

A question that I have been struggling with is if I want to use ML models (like from MLJ, (F)Lux, Pytorch, etc.) in my work but do not understand the fine technical details of a model, am I sufficiently qualified to do so? I know some packages in Julia and Python ecosystems brand themselves as “off-the-shelf” ML solutions, but I just worry my lacking of knowledge on the technical side could lead to errors in my potential analysis.

What is considered best practices here in this space? Am I unqualified to use these “off-the-shelf” solutions or would I be the target audience?

Thank you!

~ tcp

jw3126 · September 12, 2022, 7:54am

Just try it. If that is too big of a time investment, try it on a sliced down problem. Actually trying a toy problem first is almost always a good idea.

ML solution creating misleading results can absolutely happen. For many models there are no theoretical guarantees about errors or sane results.
In addition to standard best practices like regularization, train test split etc. I find the following often very useful:

You need an intuition about how you expect the solution to look like. Does the ML produce sane results?
Try a couple of models or vary some parameters. Do solutions roughly look the same or vary wildly?
Simplify your problem until you have good intuition or exact solutions. Run ML on that and get a feel for the errors it produces there.
Do you have some less precise/limited/slower ways to solve your problem or some properties of the solution? Compare these against ML.
Test the limits of your ML solution. Tweak some examples and see when then model produces bad results.
Depending on your domain you might now some invariants that should hold. Like conservation of energy etc. check these.
Depending on the model, there may be further specific tools to gain insights, find out about these.

I am pretty sure you are a target audience. That means you can apply models without knowing their inner workings, you just can’t trust them blindly. You need to be critical and apply checks like the above. These don’t require much ML knowledge, but instead domain knowledge.

Topic		Replies	Views
What is the relation between MLJ and Flux? Machine Learning	17	6509	February 10, 2021
Julia 1.3, 1.4 on MacOS and Intel MKL Error Internals & Design	23	3659	April 21, 2020
State of machine learning in Julia Machine Learning	60	65623	August 26, 2022
MLJ Tuning and Hyperparameters , Regression Performance optimization , machine-learning , mlj	0	284	November 20, 2022
Problems with Lathe TrainTestSplit New to Julia	11	1484	September 18, 2020

Best Practices for Using "Off-the-Shelf" ML Solutions?

Related topics