But indeed I gave up on that since I wanted to find a way that keeps the order.
uhhhh, I think your solution based in the default sort! is a good idea. You can choose the algorithm used by sort! and if you choose mergesort, then it will be stable.
function sort_sep!(f::Function, A::AbstractVector)
sort!(A; by = f, alg = Base.MergeSort, rev = true)
end
function run_bench()
N = 10^3
b = rand(N)
a = deepcopy(b)
@btime sort_sep!(x -> x > 0.5, $a)
...
end
gives
7.262 μs (2 allocations: 4.08 KiB)
What is not without allocation, but is has about the same time than the two-filter solution (and number of lines, XD), and allocates just one quarter of the memory. The only problem is that it does not give the number of trues, however you can use a binary search after the sort, to find the first false position in O(log n).