On my computer this code (which is the same as @lmiq suggests)
function mysort3!(a::AbstractArray{<:Any, 4})
Threads.@threads for k in axes(a, 4)
for j in axes(a, 2)
for i in axes(a, 1)
sort!(view(a, i, j, :, k))
end
end
end
return a
end
gives more than 8x speedup for 8 threads.
Unfortunately, I cannot find a way to select which dimension to sort, since there is apparently no generalization of eachrow and eachcol. eachslice and selectdim does not have the right behaviour. But hardcoding the dimension works.