Which part of these code I can change to improve the run speed and reduce the gc time?

I have many functions which generate its own condition function when it is called, just like these:


func1()=begin
    datas=rand(1000)
    (i::Int)->datas[i]>0
end

func2()=begin
    datas=rand(200)
    (i::Int)->datas[i]>0.5
end

Then the generated condition function will be called many times in for-loop of the blow function:

iterfunc(x::Function)=begin
    res=Vector{Int}(undef,size(x.datas))
    for i=1:size(x.datas,1)
        if x(i)
            res[i]=1
        else
            res[i]=-1
        end
    end
    res
end

res1=iterfunc(func1())
res2=iterfunc(func2())

At run time, these functions will generate a large percent of gc time, usually up to 80% or more.
I tried few ways to reduce the gc time, for example:

func1()=begin
    datas=rand(1000)
    @inline (i::Int)->datas[i]>0
end

func2()=begin
    datas=rand(1000)
    f=@closre (i::Int)->datas[i]>0
    identity(f)
end

But it change nothing.
I am wodering where I can change in these codes to improve the performance? Does anyone can help me?

For me on Julia 1.7 these closures are inferred fine and I don’t see gc activity. What version did you test?

It is a pretty unusual pattern that you’re using here, though. A closure wrapping an array in order to be called from a higher-order function, which just iterates the array.

In your case, this could all boil down to ifelse.(datas .> 0.5, 1, -1). Or if you want to use an anonymous function approach, maybe this

map(datas) do value
      value > 0.5 ? 1 : -1
end

Or even more complex if you really want to separate the value and function specification:

iterfunc(f, data) = [f(x) ? 1 : -1 for x in data]
iterfunc(>(0.5), rand(1000))
3 Likes

Thank you for your anwser.
I test it with version 1.7.2.

There f and data is seperated, in my broader context, the logical is that they are two parts belong to one body, which is mentioned by func1, func2 …
Anyway, though

I would think the problem is caused by elsewhere. When the iterfunc is running, there generate many BitSet, there is a perice of exsample code

iterfunc(
    timelist::Vector{DateTime},
    res::Vector{BitSet},
    func::Function)=begin
    holdset=BitSet()
    @inbounds for i=1:size(timelist,1)
        stopset=res[i]
        push!(holdset,i)
        for x in holdset
            opentime=timelist[x]
            @inbounds for j=i:-1:1
                timelist[j]<=opentime && 
                    (func(j,i) && push!(stopset,x);break)
            end
        end
        setdiff!(holdset,stopset)
    end
end

Have you any idea about this?

Do your functions generate the data? (in your example they do). That will generate a lot of garbage if you are immediately filtering the data. Probably the GC is related to how the data is handled inside the functions.

Yes, each func (fuc1, func2…) will generate its data, some of these data is just a refrence from an existed Dict, and others as new generated Matrix. Maybe it is the new Matrix that related to GC time ? There is a peice of example codes:

datasdict=Dict(:a=>[1,2,3],:b=>[2,3,4]) # an existed Dict

func1(oc::Oc)=begin
    datas=datasdict[:a]
    (j::Int,i::Int=j)->datas[i]<0
end

func2(oc::Oc)b=begin
    datas=[datasdict[:a] datasdict[:b]]
    (j::Int,i::Int=j)->datas[j]+datas[i]==1
end

There is probably more than one problem there. First, your datadict is a global variable, and that will cause type instabilities and allocations everywhere. Second, you are computing the new data every time you call the function, so that will also allocate a lot.

I think you are probably trying to use a Object-Oriented pattern where it does not fit (and is not natural in Julia). But you can use something like that with functors, in which case you can compute the data field of the struct only once and make of the object a callable function. Something like this:

julia> datasdict=Dict(:a=>[1,2,3],:b=>[2,3,4]) # an existed Dict
Dict{Symbol, Vector{Int64}} with 2 entries:
  :a => [1, 2, 3]
  :b => [2, 3, 4]

julia> struct Func1{T} # structure that will contain the data of `func1`
           data::T
       end

julia> function Func1(datasdict::Dict) # constructor that selects the data from the dict
           datas=(datasdict[:a] .< 3)
           return Func1(datas)
       end
Func1

julia> (f::Func1)(x) = x .* f.data # definition of the function-like object

julia> func1 = Func1(datasdict)  # initialize the func1 object
Func1{BitVector}(Bool[1, 1, 0])

julia> func1(2) # call it
3-element Vector{Int64}:
 2
 2
 0


The func1 object will play the role of the func1 function in your case, but it contains the data already filtered, or generated.

Yes, sorry, I shoule be clearer.
Let me restart the above example:

There is a struct Oc:

struct Oc
    datas::Dict
end

There is many (say 10) condition function, which looks like :

func1(oc::Oc,getfunc::Function)=begin
    datas=b(oc)
    (j::Int,i::Int=j)->datas[i]<0
end

.
.
.

There, “datas=b(oc)” is just an example, I don’t know what the b will be in future, I will create many of it.
and the same is “datas[i]<0”

Beside, there is many (say 10) iterfunction, which is ahead of time, each of them has unique way to iterover condition function.

So if I create a struct for each condition function, and for each iterfunction, I will need 10*10 times construction.
It looks like I need to seperate condition function and iterfunction. Merge them in a struct and (struct)(x) is not convnient for upcoming function.

I’m not sure if we have enough information to actually provide a precise advice. But in general I would suggest you to stick to the most modular and simple structure possible, for example separating the functions from the data. For instance, although functions that return other functions can be useful, I don’t think they appear in most numeric or data computing.

Thus, for instance, what you show above, could be split into

struct Oc{T<:Dict}
    data::T  # the way you show above has an abstract field, that is another issue
end

# just define the function acting on the input, returning the filtered output you want 
funct1(oc::Oc,i,j) = oc.datas[i] .< 0

# then if your iterfunc does not have `oc` as a parameter, do:
function iterfunc( 
    # other args
    func::F
) where F<:Function
    ...
    func(i,j) # call func
end

# and call this closing over the `oc` data 
iterfunc( 
    # other args
    (i,j) -> funct1(oc,i,j)
)