More threads, slower code, even if not spawning them

When using a pattern like that, I am always unsure if some customization can be done. For example:

If I happened to have to preallocate this buffer, meaning @init buf = pre_buf[threadid()] would that possibly work? (I am not mentioning specifically the threadid() use, but I would not know how to perform such preallocation, since I don’t see exactly what the macro will be doing with that instruction).

Also, I can imagine that if I preallocate the buffer, I may go against the rationale of the @floops macro, because I guess it does something more smart than having all buffers next to each other in the same array to be distributed among threads.

Another pattern I am facing which I have trouble trying to “translate” to the Folds syntax is something like the one below. This is a toy example in which I want to build a list of the numbers smaller than 0.5, but it captures some of the characteristics of the true problem:

using Base.Threads: @threads, nthreads

struct List
    n::Int
    l::Vector{Float64}
end
add_to_list(x,list) = x < 0.5 ? List(list.n+1,push!(list.l,x)) : list

# Serial version
function build_list(x)
    list = List(0,zeros(0))
    for i in eachindex(x)
        list = add_to_list(x[i],list)
    end
    return list
end

# Parallel version
append_lists!(list1,list2) = List(list1.n+list2.n,append!(list1.l,list2.l))

function build_list_threads(x)
    list = List(0,zeros(0))
    list_threaded = [ deepcopy(list) for _ in 1:nthreads() ]
    @threads for ithread in 1:nthreads()
        local_list = list_threaded[ithread]
        for i in ithread:nthreads():length(x)
            local_list = add_to_list(x[i],local_list)
        end
        list_threaded[ithread] = local_list
    end
    # reduce
    for lst in list_threaded
        list = append_lists!(list,lst)
    end
    return list
end

Uhm… now that I think this one may not be very different from the above, and with the “new syntax” could be something like:

list = List(0,zeros(0))
@floop begin
    @init buf = List(0,zeros(0))
    for i in eachindex(x)
        buf = add_to_list(x[i],buf)
    end
    @combine append_lists!(list,buf) # ??? 
end

My doubts there would roughly the same: 1) could I preallocate the bufs if I was to call this many times? 2) Is there anything special needed for the @combine syntax since the combination of two List objects is not simply an addition?