Computing log liklihood given a probability function

I am a bit overwhelmed with regard to all the information out there about this and would like some guidance with regard to how to evaluate the log liklihood (or cross entropy) function for a given probability distribution that is dependent on three parameters.

A bit of background first, I have some input data, an array of arrays called X_data. I have some binary output data (1 or 0), stored as Y_data. I also have a pre-written probability function, NTCP(X,n,m,T). This function predicts the probability that the output will be 1, given an input, X.

The probability function has 3 parameters I want to fit to minimize the cross entropy (or, alternately, maximize the log likelihood) between the predicted values y_pred=NTCP.(X_Data,n,m,T) and the actual outputs Y_data.

To this end I was thinking that I would just try to use the Optim package, but was quite confused about how the syntax for this would work. I have found the logliklihood function in the distributions package but I am quite confused as to the exact syntax with which to implement it as I was not able to find examples using arbitrary probability functions with multiple parameters.

So, given this probability function, how would I evaluate the log likelihood of obtaining Y_data given X_data and NTCP?

I think the easiest would be to implement your own log-likelihood function. Something like

function loglikelihood(n, m, T)
    prob = NTCP.(X_Data, n, m, T)
    sum(y == 1 ? log(p) : log(1 - p) for (y, p) in zip(Y_data, prob))
end
2 Likes