Should there be a `mapfilter` function or macro?

I often encounter a scenario, where I use map but then notice in the body of the closure that I want to throw away some values. I think there’s no simple way to do this. Here’s a mockup version:

itr = 1:100
mapfilter(itr) do i
    intermediate_result = first_function(i)
    if some_condition(intermediate_result)
        return second_function(intermediate_result)
    else
        return # result is filtered out
    end
end

The problem with list comprehensions is that the if condition can’t make use of intermediary values, so I would have to call first_function twice:

[second_function(first_function(i)) for i in itr if some_condition(first_function(i))]

A chained filter and Iterators.map doesn’t work because the state within map is not accessible to filter.

What would be the best way to get a function like this, preferrably with Base methods?

A for loop containimg a call to push! is flexible, clear, and fast.
You can sizehint! the vector you are pushing into up to maximum size it could be before you start.

1 Like

That is true as long as it’s easy enough to specify the return type of the vector without running your function. It might be just Int but it might be Horrible{Type{With{Many{Parameters}}}}. Using map or list comprehensions thankfully spares me from doing that.

Taking just the first value to get the type wouldn’t work in type unstable scenarios, even if those should generally be avoided of course.

1 Like

You can do it by using iterate protocol with something like this:

struct MapFilter{F1, F2, T}
    f::F1
    cond::F2
    x::T
end

mapfilter(f, cond, x) = MapFilter(f, cond, x)

function Base.iterate(x::MapFilter, state = iterate(x.x))
    while true
        state === nothing && return nothing
        val, id = state
        state = iterate(x.x, id)
        y = x.f(val)
        x.cond(y) && return (y, state)
    end
end

function _collect(mf, state, out)
    for x in Iterators.rest(mf, state)
        push!(out, x)
    end

    return out
end

function Base.collect(mf::MapFilter)
    peel = iterate(mf)
    peel === nothing && return nothing
    val, state = peel
    out = [val]
    _collect(mf, state, out)
end
julia> collect(mapfilter(x -> x^2, x -> x < 10, [1, 2, 3, 4, 3, 5, 1]))
5-element Vector{Int64}:
 1
 4
 9
 9
 1

julia> collect(mapfilter(x -> x^2, x -> x < 10, [4, 5]))

With that said, I highly recommend to use packages which already have this (and more) functionality, like for example GitHub - JuliaFolds/Transducers.jl: Efficient transducers for Julia

1 Like

The issue is (I think) that a priori the type of the collection can’t be known because it may depend on the filter function. For example, a Vector{Union{T, Missing}} may, after your mapfilter, collapse into a Vector{T} if your filter is ismissing. Unless mapfilter is specialised on that, it can’t infer the return type correctly (or has to guess and widen, like filter or map does currently I think).

2 Likes

Not sure if it’s a good idea but you can do e.g.

julia> [temp^2 for i in 1:100 for temp in (sqrt(i),) if temp < 2]
3-element Vector{Float64}:
 1.0
 2.0000000000000004
 2.9999999999999996

(For numbers the inner loop could be simplified as in for temp = sqrt(i) but I think for temp in (sqrt(i,)) is more explicit about what’s going on.)