Can Symbolics.jl build_function() be used with median() and other functions that require sort?

I started using Symbolics.jl build_function() to accelerate various calculations and it works beautifully in many situations. I currently use it mostly to accelerate loss function calculation for curve fitting but I suspect I’ll use whenever I need extra speed.

I ran into the following problem that I was hoping someone could help me with.

Summary: I cannot figure out how to use build_function() on any function that uses sorting like median(), mad() etc that are often useful for loss functions like bisquare or huber.

Here’s MWE without build_function():

using Symbolics, Statistics, StatsBase, IfElse

@variables α, β

sym_array = [rand()*α + rand()*β for i in 1:10]

median(sym_array) 

ERROR: LoadError: TypeError: non-boolean (Num) used in boolean context
Stacktrace:
  [1] sort!(v::Vector{Num}, lo::Int64, hi::Int64, #unused#::Base.Sort.InsertionSortAlg, o::Base.Order.ForwardOrdering)
    @ Base.Sort ./sort.jl:502
  [2] sort!(v::Vector{Num}, lo::Int64, hi::Int64, a::PartialQuickSort{UnitRange{Int64}}, o::Base.Order.ForwardOrdering)
    @ Base.Sort ./sort.jl:626
  [3] partialsort!
    @ ./sort.jl:97 [inlined]
  [4] #partialsort!#2
    @ ./sort.jl:156 [inlined]
  [5] partialsort!
    @ ./sort.jl:156 [inlined]
  [6] median!(v::Vector{Num})
    @ Statistics /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Statistics/src/Statistics.jl:800

I believe the above error is due to known issue where Core.ifelse() cannot be overloaded to use Num as discussed here: The alternative to `ifelse` · Issue #16 · JuliaSymbolics/Symbolics.jl · GitHub.
This issue is being worked on (make `ifelse` generic · Issue #32844 · JuliaLang/julia · GitHub) and it can be bypassed using IfElse.jl.

To see if I can get around above I wrote a quicksort function using IfElse.ifelse().
Here’s a MWE:

using Symbolics, Statistics, StatsBase, IfElse

@variables α, β

sym_array = [rand()*α + rand()*β for i in 1:10]

function hacky_partition!(array, lo, hi)
    pivot = array[hi]
    i = lo
    for j in lo:hi
        temp_i = deepcopy(array[i])
        temp_j = deepcopy(array[j])
        i_counter = IfElse.ifelse(array[j] < pivot, 1, 0)
        array[i] = IfElse.ifelse(array[j] < pivot,temp_j,temp_i)
        array[j] = IfElse.ifelse(array[j] < pivot,temp_i,temp_j)
        i += i_counter
    end
    array[hi], array[i] = array[i], array[hi] 
    return i
end

function hacky_quicksort!(array, lo, hi)
    if lo < hi
        p = hacky_partition!(array, lo, hi)
        hacky_quicksort!(array, lo, p - 1)
        hacky_quicksort!(array, p + 1, hi)
    end
end
hacky_quicksort!(array) = hacky_quicksort!(array, 1, length(array))

#Check to make sure my hacky_quicksort!(array) works
sorted_array = rand(50)
hacky_quicksort!(sorted_array)
issorted(sorted_array) #true

hacky_quicksort!(sym_array)

ERROR: LoadError: ArgumentError: invalid index: 1 + IfElse.ifelse(IfElse.ifelse(IfElse.ifelse((0.6824293224486493α + 0.9942479004750877β) < (0.1508782122896153α + 0.1253984208459693β), 0.6824293224486493α + 0.9942479004750877β, 0.6824293224486493α + 0.9942479004750877β) < (0.1508782122896153α + 0.1253984208459693β), 0.6824293224486493α + 0.9942479004750877β, 0.6824293224486493α + 0.9942479004750877β) < (0.1508782122896153α + 0.1253984208459693β), 1, 0) of type Num
Stacktrace:
 [1] to_index(i::Num)
   @ Base ./indices.jl:300
 [2] to_index(A::Vector{Num}, i::Num)
   @ Base ./indices.jl:277
 [3] to_indices
   @ ./indices.jl:333 [inlined]
 [4] to_indices
   @ ./indices.jl:325 [inlined]
 [5] getindex
   @ ./abstractarray.jl:1170 [inlined]
 [6] hacky_partition!(array::Vector{Num}, lo::Int64, hi::Int64)
   @ Main ~/.../symbolics_test.jl:15
 [7] hacky_quicksort!(array::Vector{Num}, lo::Int64, hi::Int64)
   @ Main ~/.../symbolics_test.jl:28
 [8] hacky_quicksort!(array::Vector{Num})
   @ Main ~/.../symbolics_test.jl:33
 [9] top-level scope
   @ ~/.../symbolics_test.jl:37
in expression starting at /.../symbolics_test.jl:37

So it looks like the error is due to the fact that in my hacky_quicksort!() there’s a step where Num is used to switch index of an array as part of sorting algorithm and that’s not allowed.

Is there some workaround to allow calculation of median() on Num arrays or is this currently not supported?

@ChrisRackauckas @shashi

https://symbolics.juliasymbolics.org/dev/manual/faq/#Transforming-my-function-to-a-symbolic-equation-has-failed.-What-do-I-do?-1

@ChrisRackauckas

Thanks, I’ve read Frequently Asked Questions · Symbolics.jl and I think I understand the example you gave there of when it doesn’t make sense to represent something symbolically but I feel like this doesn’t fully apply in my example (I could be wrong). Are you saying that it doesn’t make sense to represent sort algorithms symbolically?

I think I bypassed the error that you quoted using your IfElse.ifelse() as described above and I believe there’s no while Num < tol in sort algorithms so I think it’s possible and useful to represent it symbolically. But maybe I’m missing something.

After bypassing the above error with IfElse.ifelse(), now I have the following error that basically says that to_index(i::Num) is not supported. Is there any current workaround for this and/or is there a plan to support this in future?

Yes, they do not have a fixed number of iterations for all possible inputs. Is there a reason to not just @register this sort algorithm?

I had to think about @register for a bit.

It initially seemed like it could be a good workaround but now I’m not sure anymore. A bit of an inconvenience is that I can @register so that median(x) doesn’t trace but I can’t figure out how to make sure that median([x,y,z]) also doesn’t trace. Since all of those functions (e.g. median, mad, sort etc) are typically used with some arrays, having only median(x) but not median([x,y,z]) seems similar to just not including median in any of the code for build_function since at least for me I would almost always use a Vector{Num} in build_function.

For example:

@variables x, y

@register Statistics.median(x)
median(x+y) # Statistics.median(x+y)

@register Statistics.median([x,y])
median([x,y]) #ERROR: LoadError: TypeError: non-boolean (Num) used in boolean context

Is there any way to make @register so it doesn’t trace Vector{Num}?

@YingboMa should we register AbstractArray methods as well?