The composite quantile loss for a set of quantiles can be defined as

where p_j \in(0,1) denote the *j*-th quantile level/probability, y_i the *i*-th target/observation, and q_{ji} the p_j-quantile for the *i*-th sample. For a year or two I have been using the following implementations

```
# qt size: level, case
function qtloss(qt::AbstractMatrix, y::AbstractVector, prob::AbstractVector; agg = mean)
err = y .- qt' # case x level
agg( (prob' .- (err .< 0)) .* err )
end
function qtlossMax(qt::AbstractMatrix, y::AbstractVector, prob::AbstractVector; agg = mean)
err = y .- qt'
agg( max.(prob' .* err, (prob' .- 1) .* err) )
end
```

where the former is slightly faster in my experience. Btw, are there better alternatives with respect to speed and memory usage?

Now I would like to extend this implementation to Flux â€śimageâ€ť models with data dimensions WHCN. C would here be the level/probability dimension and the dimensions WHN the â€ś*i*-summationâ€ť. Any suggestions would be appreciated, in particular for GPUs. Reducing to the 2D-case with permutedims/reshape could be one option, but â€¦