Or…
How to numerically find the value of a set of variables that produce a given condition?

Say I have a problem in which, under certain conditions, something happens.
For example, a machine that stops working.
Or a sensor which, for some reason, starts to produce strange values.
Or a person who suffers a heart attack.

I want to know why this happened or it’s going to happen.

I have a table with several input variables (continuous and/or discrete, maybe including time) and one output.
This table contains observations from many experiments (or devices or people).
I want to find what input is more likely to produce that output (the categorical output representing the extreme condition).

I don’t have a parametric function or specific model. Then I will try to fit a decision tree or a neural network.
Then I would use that fitted model as a blackbox function as if I had an optimization problem, which is likely to be noisy, not convex and mixing integers and continuous variables.

Is there any other easier or more direct way to solve this problem?

If you have a binary response (extreme vs. non-extreme condition), you can investigate the sensitivity of this response variable to all other variables in the table. The question then becomes: how sensitive is my response to variable X1, X2, … or to a combination of variables X1-X2, X1-X3, …

If you can afford a model like a decision tree, then the sensitivity analysis question is trivial. You can check which variables are chosen as the first features in the tree and which are chosen as the last. Alternatively, you can use a generic method such as Shapley values that works with any machine learning model:

OK thank you, I will investigate about interpretable machine learning.
Anyway that seems useful to select the variables that have a larger effect on the output but not for finding the best values.

And what if the output doesn’t depend just on the current values but on the pattern of values that has already happened?
I guess then I would also need to use time lagged versions of the variables or use some kind of memory. Maybe LSTM networks, I haven’t used them before.

No need to reach so quickly for a computationally expensive technique like mixed-integer programming or a neural network!

If your goal is to predict “whether an event happens” (Y = 0 or 1) based on “state information” about the model (various numbers X_j), and you have observations data in the form of y_i and x_{ij} for j = 1 \dots m, then you can just use logistic regression.

This fits a function of the form

P(Y = 1) = \frac{1}{1+ \exp( \beta \cdot X)}

with parameter \beta, and (assuming the input data X_j are not terribly intercorrelated) the value of \beta_j is easy to interpret: The sign tells you if it increases or decreases the probability of Y = 1, and the magnitude tells you how much. Estimating \beta by maximum likelihood requires only solving a concave maximization problem.

The fact that some of your X_j are categorical can be handled using dummy coding.

But a logistic model can work well if the most probable event is located towards the lower values of X or towards the higher values (monotonically decreasing or increasing).
If our data show a maximum around intermediate X values, or multiple maximums, or something more complex (such as dependency on past values), then it won’t work.

Sure, but you can build these kinds of features into the model by including quadratic or higher-order terms (which has no effect on the difficulty of estimation). Logistic regression isn’t a magic swiss army knife that will solve everything, but it’s better to start with a simple model and add complexity gradually as you need it than the opposite, right?

Anyway fitting the data with a logistic model will create a model to predict new Y values from X, but won’t tell me what X values are more likely to produce that Y=1.

How should I do it?
Just calculating the derivative of the polynomial fitted function and equating to zero?
Or I need to use something more complex?