Composing collection of iterators that accept iterators and collecting the result

djholiver · August 9, 2021, 4:41pm

Hi,

I have a series of methods that return the relevant query indexes for a filter type e.g. “Between(0,500]” or “name = Julia” that are combined:

function queryModelCollection(iQuery ::  QueryModels.QueryModel, data :: Arrow.Table)
    LookupStore = Set(iQuery.querySet)
    col =  data[Symbol(iQuery.column)]  
    colLength = length(col)
  #  startIndex = MinVal == Inf ? 1 : MinVal
   # endIndex = MaxVal == -Inf ? colLength : MaxVal
    rowSet = Vector{Int64}()

   # if(TotalLength != 0) sizehint!(rowSet, TotalLength) end 

   for r = 1 : colLength
    if(col[r] === missing) continue end 
    #if(ismissing(col[r])) continue end
    @inbounds  if(col[r] in LookupStore)
        push!(rowSet, r)
        end
    end
    return rowSet
end

so f1 = vec1, f2 = vec2 etc etc. after these results are returned I then intersect the outputs:

indexPlaceHolder = Vector{Vector{Int64}}()
 push!(indexPlaceHolder, vec1)
 push!(indexPlaceHolder, vec2)

        lengthPlaceholder = length(indexPlaceHolder)
        if lengthPlaceholder <= 0 return indexPlaceHolder end
        local c = indexPlaceHolder[1]
        if lengthPlaceholder <= 1 continue  end
        intersectSect = Set(c)
         for p in 2:length(indexPlaceHolder)
              intersect!(intersectSect , indexPlaceHolder[p])
         end

I am attempting to reduce the memory overhead of populating the intermediary collections (this is killing performance) by writing a number of iterable (foldable) methods with a similar signature:

@fgenerator function ModelTransducer(iQuery ::  QueryModels.QueryModel, data :: Arrow.Table, enumerable...)
    LookupStore = Set(iQuery.querySet)
    col =  data[Symbol(iQuery.column)]  
    rowSet = Vector{Int64}()
  foreach(enumerable) do r
    #if(col[r] === missing) continue end 
    @inbounds  if(col[r] in LookupStore)
        @yield r
        end
    end
end

to the extent that I can pass the result of one to the other:

indexVectors = collect(1:colLength)
foreach(ModelTransducer(querymodel , table , ModelTransducer(querymodel2 , table , indexVectors))) do x
   # if x <1 continue end
         
   # end
 end

aside from this having a bit more overhead that I expected, I want to really reduce my memory overhead (all time is lost in allocations as it stands)

I do not know how many filters will be passed in until run time, so the examples that I have seen to date with pipes / composition dont feel like they would work. It seems as though the foldl approach should work, but the method signatures require modification.

Essentially - I believe:


collect(foldl(∘,[ModelTransducer(querymodel , table), ModelTransducer(querymodel , table)])(indexVectors))

should “just work” but I’m unclear on how the piping method signature is expected to work. There are lots of discussions that I have seen about this on the forums but I can’t see how to specifically call:

collect(f(a,b,f(q,b,x)))
or
fold(o, f(a,b) , f(q,b)(x)

for “x” iterators

Regards

Topic		Replies	Views
Iterate sequentially over several collections General Usage	7	632	August 19, 2018
Help writing a `collect` for iterators General Usage question , iterators	11	273	July 26, 2024
Collect (flat) iterator as an Array General Usage question , iterators	3	224	January 10, 2024
Intersection of ProductIterator Internals & Design	2	423	January 15, 2021
Simple, understandable iterator Performance question , iterators	0	135	January 20, 2025

Composing collection of iterators that accept iterators and collecting the result

Related topics