Request to upgrade to LossFunctions.jl

juliohm · April 24, 2023, 11:44pm

Any suggestion on how to handle deriv and deriv2 in this approach @aplavin ? Currently we can perform these aggregations with first and second derivatives as well.

juliohm · April 24, 2023, 11:46pm

The design space here is a bit difficult to navigate, but I think we are converging into something that has all the benefits of nice Julian syntax, nice performance and clean implementation.

CameronBieganek · April 24, 2023, 11:52pm

I think you’re losing sight of the actual use case for these loss functions. Within the context of MLJ, loss functions (called “measures” in MLJ) are used for evaluating machine learning models, with a typical use case looking like this:

evaluate!(
    mach,
    resampling = CV(nfolds=5),
    measure = [rms, mae]
)

With the approach you’re advocating, I would have to write that like this:

evaluate!(
    mach,
    resampling = CV(nfolds=5),
    measure = [
        (ŷ, y) -> sqrt(mean(L2DistLoss(), ŷ, y)),
        (ŷ, y) -> mean(L1DistLoss(), ŷ, y))
    ]
)

If we take your approach to its logical conclusion, no one would ever be allowed to write a function that takes an AbstractArray argument (unless that AbstractArray represents a mathematical vector, matrix, or tensor). But that’s a bit pedantic—factoring out commonly used expressions into separate functions is a normal part of programming.

All that being said, my feeling is that these metric functions are so trivial that there’s not much to be gained by attempting to get Flux and MLJ to use the same loss function implementation. (Is that the implicit goal of LossFunctions.jl?)

aplavin · April 24, 2023, 11:55pm

You are quoting me on implicit broadcasting methods, but your example is on the aggregation. These two are effectively independent. Maybe the latter should stay, idk, but the former should really go away.

juliohm · April 24, 2023, 11:55pm

Agree with you @CameronBieganek that there is still value in considering the vector-based methods. We are really brainstorming here to see what is the best compromise, nothing decided yet.

It would be extremely helpful for the community to maintain a common set of loss functions in a single repository that is shared among different ML frameworks. It is ideal scenario of course, but I think we should put energy into that direction.

aplavin · April 24, 2023, 11:59pm

Regarding aggregation: I only compared this to value(L1DistLoss(), ŷ, y, AggMode.Mean()). Sure, you can pack this into MAE(), either way.

juliohm · April 25, 2023, 12:10am

Let me summarize the status quo so that we can organize the discussion moving forward…

Currently we have loss functions in LossFunctions.jl that are represented with structs, and this is useful to hold state and hyperparameters that sometimes exist. Given a loss object we can do 3 things with it:

value: the actual value loss(yhat, y)
deriv: the first derivative
deriv2: the second derivative

These derivatives are written by hand, and sometimes are not defined everywhere (autodiff doesn’t help). So LossFunctions.jl has this great feature of collecting non-trivial derivatives besides the value of the function.

Now the design questions regarding iterables of observations yhat and y, weights w and FUN in [value,deriv, deriv2]:

Should we support FUN(loss, yhat, y, [w]) and default to mean aggregation? Is there any way to inform other types of aggregation in this syntax without dispatching on AggMode types?
Should we adopt the alternative syntax mean(loss, yhat, y, [w]) instead? What about the deriv and deriv2 cases?

It is not clear to me yet which path is the most productive.

Regarding the case where loss(yhat, y, [w]) returns a vector of values, I agree with @aplavin that we should use broadcast instead. Whichever syntax we choose, we should avoid implicit broadcasting and let Julia handle the “vectorization” for us.

juliohm · April 25, 2023, 12:42am

The PR is shaping up nicely. I fixed the functor interface for scalar arguments. Now we just need to brainstorm the aggregation API a bit further. It seems to me that the necessity to model deriv and deriv2 will end up forcing us to go with AggMode types, but I may be wrong.

CameronBieganek · April 25, 2023, 1:01am

I’m not sure I understand the utility of deriv. In the context of Flux, wouldn’t that be handled automatically by Zygote (or other AD)? In the context of MLJ.evaluate!, I don’t think we need deriv.

juliohm · April 25, 2023, 1:58am

Some loss functions are defined by parts and are not dofferentiable everywhere. LossFunctions.jl writes the derivatives manually but people are free to use autodiff as well on top of the value(loss, yhat, y) when that is possible.

aplavin · April 25, 2023, 8:55am

AFAIK, the ChainRules package is the go-to way to define custom derivatives. It’s supported by lots of packages, and utilized by multiple AD systems.
Why not define LossFunctions derivatives through ChainRules? Then there’s no need for separate deriv – neither for developers, nor for users to learn.

juliohm · April 25, 2023, 1:33pm

I will take a look at ChainRules.jl. Maybe the best plan forward is to consider the API for the value function with sum and mean like in StatsBase.jl and leave the refactoring of deriv and deriv2 for a second PR with ChainRules.jl

juliohm · April 25, 2023, 6:49pm

I’ve updated the PR to adopt the simple sum and mean functions as opposed to the AggMode types. All tests are passing. We just need to update the docs and add tests for iterables that are not AbstractVector.

juliohm · April 27, 2023, 7:10pm

The PR has been merged now with the AggMode module gone

Next I would like to remove the value function in favor of the functor interface always.

juliohm · April 27, 2023, 9:20pm

Done. If anyone wants to take a look at the master branch of LossFunctions.jl I think we are ready for another breaking release with removed AggMode and removed value function. Anything else we could break right away before the next release?

The work around deriv and deriv2 I will leave for a future opportunity. I am not used to the ChainRules.jl ecosystem, and someone can help there.

juliohm · April 28, 2023, 4:43pm

@aplavin do you have an example of how we could replace deriv and deriv2 by the ChainRules.jl approach? I could then copy/paste the changes throughout the package.

juliohm · May 3, 2023, 5:21pm

I’ve released LossFunctions.jl v0.10 with the latest breaking changes.

Topic		Replies	Views
ANN: LossFunctions.jl Machine Learning package , announcement	3	1338	February 17, 2017
Machine Learning Toolset Improvement Machine Learning	10	2562	December 13, 2018
[ANN] SymbolicRegression.jl 1.0.0 - Distributed High-Performance Symbolic Regression in Julia Package Announcements package , symbolic-regression	24	1501	November 29, 2024
ANN: MLLabelUtils.jl Machine Learning package , announcement	0	988	January 1, 2017
[ANN] Imbalance.jl - A well-documented, multi-interface and comprehensive Julia toolbox for addressing class imbalance Package Announcements package , announcement , machine-learning , mlj , classification	3	736	October 12, 2023

Request to upgrade to LossFunctions.jl

Related topics