Hi,
I have a series of methods that return the relevant query indexes for a filter type e.g. “Between(0,500]” or “name = Julia” that are combined:
function queryModelCollection(iQuery :: QueryModels.QueryModel, data :: Arrow.Table)
LookupStore = Set(iQuery.querySet)
col = data[Symbol(iQuery.column)]
colLength = length(col)
# startIndex = MinVal == Inf ? 1 : MinVal
# endIndex = MaxVal == -Inf ? colLength : MaxVal
rowSet = Vector{Int64}()
# if(TotalLength != 0) sizehint!(rowSet, TotalLength) end
for r = 1 : colLength
if(col[r] === missing) continue end
#if(ismissing(col[r])) continue end
@inbounds if(col[r] in LookupStore)
push!(rowSet, r)
end
end
return rowSet
end
so f1 = vec1, f2 = vec2 etc etc. after these results are returned I then intersect the outputs:
indexPlaceHolder = Vector{Vector{Int64}}()
push!(indexPlaceHolder, vec1)
push!(indexPlaceHolder, vec2)
lengthPlaceholder = length(indexPlaceHolder)
if lengthPlaceholder <= 0 return indexPlaceHolder end
local c = indexPlaceHolder[1]
if lengthPlaceholder <= 1 continue end
intersectSect = Set(c)
for p in 2:length(indexPlaceHolder)
intersect!(intersectSect , indexPlaceHolder[p])
end
I am attempting to reduce the memory overhead of populating the intermediary collections (this is killing performance) by writing a number of iterable (foldable) methods with a similar signature:
@fgenerator function ModelTransducer(iQuery :: QueryModels.QueryModel, data :: Arrow.Table, enumerable...)
LookupStore = Set(iQuery.querySet)
col = data[Symbol(iQuery.column)]
rowSet = Vector{Int64}()
foreach(enumerable) do r
#if(col[r] === missing) continue end
@inbounds if(col[r] in LookupStore)
@yield r
end
end
end
to the extent that I can pass the result of one to the other:
indexVectors = collect(1:colLength)
foreach(ModelTransducer(querymodel , table , ModelTransducer(querymodel2 , table , indexVectors))) do x
# if x <1 continue end
# end
end
aside from this having a bit more overhead that I expected, I want to really reduce my memory overhead (all time is lost in allocations as it stands)
I do not know how many filters will be passed in until run time, so the examples that I have seen to date with pipes / composition dont feel like they would work. It seems as though the foldl approach should work, but the method signatures require modification.
Essentially - I believe:
collect(foldl(∘,[ModelTransducer(querymodel , table), ModelTransducer(querymodel , table)])(indexVectors))
should “just work” but I’m unclear on how the piping method signature is expected to work. There are lots of discussions that I have seen about this on the forums but I can’t see how to specifically call:
collect(f(a,b,f(q,b,x)))
or
fold(o, f(a,b) , f(q,b)(x)
for “x” iterators
Regards