ThreadsX.jl or any JuliaFolds-related packages do not have out-of-the-box multi-dimensional reduction API. This is mainly because Julia already has a rich set of DSL packages such as Tullio.jl, TensorOperations.jl, and LoopVectorization.jl etc. to support it.
Having said that, you can use
Broadcasting transducer to construct a custom multi-dimensional reduction:
julia> using Folds, Transducers, Statistics
julia> randn(2, 3) |> eachcol |> Map(x -> x ./ mean(x)) |> Broadcasting() |> Folds.sum
or simply looping over the input in the other way around:
julia> Folds.collect(sum(xs) for xs in eachcol(randn(2, 3)))