Request to upgrade to LossFunctions.jl

Dear ML community, we released LossFunctions.jl v0.9 with a few important breaking changes:

  • Reversed order of arguments to match other ecosystems, loss(yhat, y) is now the order.
  • Removed the ObsDim business to support a more general interface with iterables of observations.
  • Removed OrdinalMarginLoss to support a more general interface with CategoricalArrays.jl].
  • Dropped unnecessary dependencies, the only dependency is CategoricalArrays.jl now.

These changes will allows us to sync the widely tested loss functions in LossFunctions.jl with loss functions defined in other ecosystems such as Flux.jl. I would like to invite every ML contributor in Julia to try to move their own loss function implementations to LossFunctions.jl and join efforts maintaining the package.

Our next goal is to formalize the support for general datasets with known observation dimension (e.g. Tables.jl). Previously datasets were assumed to be n-dimensional arrays, which is quite limiting and low-level.

9 Likes

Wonder what’s the motivation for implicit broadcasting? Julia doesn’t really do that elsewhere…

3 Likes

Can you elaborate?

From the docs:

julia> value(L2DistLoss(), 1.0, 0.5)
0.25

julia> value(L2DistLoss(), true_targets, pred_outputs)
3-element Array{Float64,1}:
 0.25
 4.0
 1.0

Following Julian approach, the latter should be value.(...), with a dot.

2 Likes

In this case we want to preserve the vectorized version in order to optimize aggregation methods. Check the AggMode examples where the result is usually summed up.

Optimizations can dispatch on different types of dataset to aggregate the scalar version more efficiently.

Aggregation is a different story, I was only talking about the value method that takes and returns arrays. Julia specifically avoids defining “implicitly broadcasted” methods, as there is no reason to do this.

Aggregation is completely orthogonal to that. Although, instead of

value(L1DistLoss(), [1,2,3], [2,5,-2], AggMode.Sum())

one of the following would be cleaner and not require learning new symbols/objects (AggMode, Sum):

# minimal change from yours:
value(L1DistLoss(), [1,2,3], [2,5,-2], sum)
# like Julia Base map(f, [...], [...]), but aggregate:
sum(L1DistLoss(), [1,2,3], [2,5,-2])
3 Likes

I like the proposed alternatives and we could consider them in the next breaking release. The only thing I don’t see yet is how to incorporate weights in these summations? It is easy when the losses are summed up without weights as we can simply rely on sum and mean from Base.

You can follow StatsBase and do

mean(L1DistLoss(), [1,2,3], [2,5,-2], weights([1, 2, 1]))

Also: see the already existing dims argument of sum/mean instead of introducing ObsDim.

1 Like

I think you missed the point of this release. We already removed ObsDim, and are continuously updating the code to look more Julian:

We will consider the StatsBase.jl approach for weights in our next brainstorming phase.

Nice, so that must be a docs issue? I’ve been looking at the “latest” docpage at Efficient Sum and Mean · LossFunctions.jl, it should be there right? As the release already happened:

1 Like

Yes, probably a docs issue. We updated the docs but the build scripts did not deploy it apparently :confused:

Most machine learning metrics do both a mapping and a reduction in order to calculate a scalar metric. I think it’s reasonable to encapsulate that in one function with a signature like mymetric(ŷ, y), where ŷ and y are iterables. And in fact that is what both MLJ and Flux do:

https://fluxml.ai/Flux.jl/stable/models/losses/

Personally, I don’t really see a need for a value function. It seems like regular old explicit functions like mae(ŷ, y) are good enough and a lot easier to read.

1 Like

We are refactoring the package as a whole, it is likely that the next release will get rid of the AggMode submodule and will use a more Julian approach.

1 Like

A big advantage of the AggMode currently implemented in LossFunctions.jl is that it doesn’t allocate intermediate arrays. We can dispatch on specific aggregation methods and reduce the terms of the aggregation without broadcasting a big vector of losses to be summed up later. Am I missing something?

Started cleaning up AggMode.None in this PR:

Thanks for pointing these out @CameronBieganek. The main difference I see between MLJ and Flux is that MLJ decided to place weight vectors as the third argument of the loss(ŷ, y, w), whereas Flux decided to place the weights in the aggregation function loss(ŷ, y, agg=x->mean(w .* x)). Moreover, MLJ defaults to no aggregation whereas Flux defauls to mean.

In terms of performance, the MLJ approach gives more opportunity to avoid memory allocations. For example, if the aggregation is sum we can simply

sum(wi * loss(ŷi, yi) for (ŷi, yi, wi) in zip(ŷ, y, w))

without ever allocating intermediate arrays.

I am tempted to implement the MLJ approach, but am open to mroe input before proceeding.

2 Likes

I am starting to consider that maybe the AggMode.Sum, AggMode.Mean and AggMode.WeightedSum should be preserved as types in order to dispatch more efficient implementations for the different types of aggregations. Would be happy to be convinced otherwise.

Well, many MLJ metrics use aggregation, but some do not. They have a trait reports_each_observation that specifies whether or not aggregation is used.

1 Like

Thanks for clarifying. We will certainly not follow this pattern. We want the same behavior always and extra options to change the behavior to something else if necessary.

I think the “implicit broadcasting” methods (like L1DistLoss()([1,2,3], [2,5,-2])) should just be removed, and users directed to regular Julia broadcasting.
This change doesn’t affect aggregations at all!

Regarding aggregations:

Big advantage with respect to what?
Surely you can dispatch stuff like

sum(L1DistLoss(), [1,2,3], [2,5,-2])
mean(L1DistLoss(), [1,2,3], [2,5,-2], weights([1, 2, 1]))

to exactly the same kind of code as the current value(L1DistLoss(), [1,2,3], [2,5,-2], AggMode.Sum()) does. With exact same performance.

2 Likes

I think we can assume that these are the only two useful aggregation functions, i.e. sum and mean with optional weights, and then get rid of all aggregation types in LossFunctions.jl. Thanks for the suggestion.

1 Like