Transducer allocations

I’m trying to use transducers to do data manipulation such as computing a rolling simple moving average. The code works fine, however why is the code allocating memory when I’ve pre allocated it already? (look at last two functions)

using DataFrames
using BenchmarkTools
using Transducers
using Transducers: SIMDFlag, GetIndex, ZipSource, SetIndex, _map!

function _prepare_map(xf, dest, src, simd)
    # isexpansive(xf) && error("map! only supports non-expanding transducer")
    # TODO: support Dict
    indices = eachindex(dest, src)

    rf = reducingfunction(
        opcompose(ZipSource(opcompose(GetIndex{true}(src), xf)), SetIndex{true}(dest)),
        (::Vararg) -> nothing,
        simd = simd)

    return rf, indices, dest
end

function Base.map!(xf::Transducer, dest::AbstractArray, src::AbstractArray;
    simd::SIMDFlag = Val(false))
    _map!(_prepare_map(xf, dest, src, simd)...)
    return dest
end


function sma!(len, vec_in, vec_out)
    map!(opcompose(Consecutive(len, step=1), Map(mean)), vec_in, vec_out)
end

function main()
    N = 10^5
    df = DataFrame(:data => ones(N))
    sma_length = 10
    df[!,"data"] = 1:N
    df[!,"sma"] .= 0.
    
    sma!(sma_length, df.data, df.sma)
    df[!,"sma"] .= 0.
    @btime sma!($sma_length, $df.data, $df.sma)
end

main()

492.581 ms (299493 allocations: 19.83 MiB)

It’s an inference failure. I played with it in Cthulhu a bit but I couldn’t find out exactly where the compiler gives up. But I’d point out ZipSource and Consecutive are very complex transducers. So, the inference failure is (disappointing but) not surprising.

Meanwhile, if you “just” need to implement sma! on vectors, I think it’d be much less painful to just write raw loops. Transducers like Consecutive becomes strictly necessary only when it is used within other non-trivial processing.

1 Like