I have many functions which generate its own condition function when it is called, just like these:
func1()=begin
datas=rand(1000)
(i::Int)->datas[i]>0
end
func2()=begin
datas=rand(200)
(i::Int)->datas[i]>0.5
end
Then the generated condition function will be called many times in for-loop of the blow function:
iterfunc(x::Function)=begin
res=Vector{Int}(undef,size(x.datas))
for i=1:size(x.datas,1)
if x(i)
res[i]=1
else
res[i]=-1
end
end
res
end
res1=iterfunc(func1())
res2=iterfunc(func2())
At run time, these functions will generate a large percent of gc time, usually up to 80% or more.
I tried few ways to reduce the gc time, for example:
func1()=begin
datas=rand(1000)
@inline (i::Int)->datas[i]>0
end
func2()=begin
datas=rand(1000)
f=@closre (i::Int)->datas[i]>0
identity(f)
end
But it change nothing.
I am wodering where I can change in these codes to improve the performance? Does anyone can help me?
For me on Julia 1.7 these closures are inferred fine and I don’t see gc activity. What version did you test?
It is a pretty unusual pattern that you’re using here, though. A closure wrapping an array in order to be called from a higher-order function, which just iterates the array.
In your case, this could all boil down to ifelse.(datas .> 0.5, 1, -1). Or if you want to use an anonymous function approach, maybe this
map(datas) do value
value > 0.5 ? 1 : -1
end
Or even more complex if you really want to separate the value and function specification:
iterfunc(f, data) = [f(x) ? 1 : -1 for x in data]
iterfunc(>(0.5), rand(1000))
Thank you for your anwser.
I test it with version 1.7.2.
There f and data is seperated, in my broader context, the logical is that they are two parts belong to one body, which is mentioned by func1, func2 …
Anyway, though
I would think the problem is caused by elsewhere. When the iterfunc is running, there generate many BitSet, there is a perice of exsample code
iterfunc(
timelist::Vector{DateTime},
res::Vector{BitSet},
func::Function)=begin
holdset=BitSet()
@inbounds for i=1:size(timelist,1)
stopset=res[i]
push!(holdset,i)
for x in holdset
opentime=timelist[x]
@inbounds for j=i:-1:1
timelist[j]<=opentime &&
(func(j,i) && push!(stopset,x);break)
end
end
setdiff!(holdset,stopset)
end
end
Do your functions generate the data? (in your example they do). That will generate a lot of garbage if you are immediately filtering the data. Probably the GC is related to how the data is handled inside the functions.
Yes, each func (fuc1, func2…) will generate its data, some of these data is just a refrence from an existed Dict, and others as new generated Matrix. Maybe it is the new Matrix that related to GC time ? There is a peice of example codes:
datasdict=Dict(:a=>[1,2,3],:b=>[2,3,4]) # an existed Dict
func1(oc::Oc)=begin
datas=datasdict[:a]
(j::Int,i::Int=j)->datas[i]<0
end
func2(oc::Oc)b=begin
datas=[datasdict[:a] datasdict[:b]]
(j::Int,i::Int=j)->datas[j]+datas[i]==1
end
There is probably more than one problem there. First, your datadict is a global variable, and that will cause type instabilities and allocations everywhere. Second, you are computing the new data every time you call the function, so that will also allocate a lot.
I think you are probably trying to use a Object-Oriented pattern where it does not fit (and is not natural in Julia). But you can use something like that with functors, in which case you can compute the data field of the struct only once and make of the object a callable function. Something like this:
julia> datasdict=Dict(:a=>[1,2,3],:b=>[2,3,4]) # an existed Dict
Dict{Symbol, Vector{Int64}} with 2 entries:
:a => [1, 2, 3]
:b => [2, 3, 4]
julia> struct Func1{T} # structure that will contain the data of `func1`
data::T
end
julia> function Func1(datasdict::Dict) # constructor that selects the data from the dict
datas=(datasdict[:a] .< 3)
return Func1(datas)
end
Func1
julia> (f::Func1)(x) = x .* f.data # definition of the function-like object
julia> func1 = Func1(datasdict) # initialize the func1 object
Func1{BitVector}(Bool[1, 1, 0])
julia> func1(2) # call it
3-element Vector{Int64}:
2
2
0
The func1 object will play the role of the func1 function in your case, but it contains the data already filtered, or generated.
Yes, sorry, I shoule be clearer.
Let me restart the above example:
There is a struct Oc:
struct Oc
datas::Dict
end
There is many (say 10) condition function, which looks like :
func1(oc::Oc,getfunc::Function)=begin
datas=b(oc)
(j::Int,i::Int=j)->datas[i]<0
end
.
.
.
There, “datas=b(oc)” is just an example, I don’t know what the b will be in future, I will create many of it.
and the same is “datas[i]<0”
Beside, there is many (say 10) iterfunction, which is ahead of time, each of them has unique way to iterover condition function.
So if I create a struct for each condition function, and for each iterfunction, I will need 10*10 times construction.
It looks like I need to seperate condition function and iterfunction. Merge them in a struct and (struct)(x) is not convnient for upcoming function.
I’m not sure if we have enough information to actually provide a precise advice. But in general I would suggest you to stick to the most modular and simple structure possible, for example separating the functions from the data. For instance, although functions that return other functions can be useful, I don’t think they appear in most numeric or data computing.
Thus, for instance, what you show above, could be split into
struct Oc{T<:Dict}
data::T # the way you show above has an abstract field, that is another issue
end
# just define the function acting on the input, returning the filtered output you want
funct1(oc::Oc,i,j) = oc.datas[i] .< 0
# then if your iterfunc does not have `oc` as a parameter, do:
function iterfunc(
# other args
func::F
) where F<:Function
...
func(i,j) # call func
end
# and call this closing over the `oc` data
iterfunc(
# other args
(i,j) -> funct1(oc,i,j)
)