If you’re happy with 0.5 thresholding you can also just call mode.(x), or call predict_mode instead of predict in MLJ (which I assume you are using). The CategoricalDistributions.jl readme has more. And you may want to look at Working with Dategorical Data section of the MLJ manual.
Or you can even wrap your probabilistic model using MLJ’s BinaryThresholdPredictor to get a point-predictor and optimise the threshold to minimise your loss by wrapping again using TunedModel. There is an example in the More on Probabilistic Predictors section of the MLJ manual.
If you know other competitions that are fit for MLJ, I will gladly take a bite at them! Or if you have Julia ML framework recommendations too.
I’ve tried optimizing for accuracy instead just now, and I’ve got a lower score (0.787 < 0.788) which seems pretty logical for me, since we don’t actually know whether the testing dataset is balanced or not. They might be having a slightly unbalanced testing set to reward users who think about balancing, even though it’s not super necessary.
I’ve compared my two out.csv. About 2% of my answers changed because of using a balanced model. On the other hand, my score increased by about 0.1%, so I’m uncertain if there’s really an improvement or just pure chance