Advice / options for filtering output of long-running search procedure

I have an exhaustive search procedure that runs for a long time and will generate multiple positive instances. I would like to filter its output and finally print the results. Inspired by the classical UNIX pipeline something like the following is what I had in mind:
searchProc(args) |> filter1 |> filter2 |> print
However, if I understand correctly,searchProc() will have to run to completion prior to filter1 acting on its return value. Since I would like to see the filtered output ASAP this is exactly what I want to avoid.

I have considered serializing each “hit” that the search procedure finds but this doesn’t feel right. I’d be grateful to learn of a strategy for managing this data flow.

Many thanks.

Probably one would need more details, but for example if the data is a long file, or a continuous input from stdin, you could do

function print_filtered(io)
    for line in eachline(io)
       if occursin("water",line) # filter lines with "water" in them
         println(line)
       end
    end
end
print_filtered(stdin)
julia print_filtered.jl < input.txt

If you need to store the results, push the filtered data to an array.

Thanks for the suggestion. I shoudl have said, though, that the output of the initial procedure is a set of structs. So it is more accurately an enumeration procedure and this is why I was drawn to serialize.

Also, I had wanted to run the pipeline in its entirety in the REPL.

Can you edit searchProc, or is it a black box? If the former, you can do this easily, e.g. with:

# trivial search procedure. Filters in favor of numbers greater than 0.5
function search_proc(x; show = false, show_filter = x->true)
    a = similar(x, 0)
    for i in x
        if i > 0.5
            push!(a, i)
        end
        if show && show_filter(i)
            println(i)
        end
    end
    a
end

julia> a = rand(100);

# only show numbers greater than 0.98
julia> search_proc(a, show = true, show_filter = >(0.98));
0.9877433094462109
0.9992308229963736
0.9825226392951352
0.9849910761512621
0.9997881340948545

You can make the API for this more convenient, but in any case that’s one way of dealing with it. If you wanted multiple filters, you could compose them with the operator, or create an anonymous function with x -> ... for anything more complicated.

1 Like

Thank you very much. It wasn’t the type of solution I had in mind but I think it will work.

Thanks again.