Mapping all methods of a module into a Dictionary

Hi,

I’m trying to populate a Dictionary{String, Function} with the exported methods found in StatsBase so that I can expose them to a UI and I’m getting unexpected results:

using StatsBase

d = Dict{String,Function}()

functions = names(StatsBase, all= false, imported = false)

#print(functions) 
for f in functions
   println(typeof(f)) 
   # func = f
    d[string(f)] =  getfield(StatsBase, f) 
end

returns: 
Symbol
MethodError: Cannot `convert` an object of type Type{AbstractDataTransform} to an object of type Function
Closest candidates are:
  convert(::Type{T}, !Matched::T) where T at essentials.jl:171

Stacktrace:
 [1] setindex!(::Dict{String,Function}, ::Type{T} where T, ::String) at .\dict.jl:380
 [2] top-level scope at .\In[61]:11
 [3] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091


As you can see, type checking f returns Symbol, further (taking from an earlier example) the following works:

using StatsBase
fn = "percentile"
v = [i for i in 1:10]

p = getfield(StatsBase, Symbol(fn)) :: Function
#test output - percentile(v , [10, 50, 75])
p(v , [10, 50, 75])

is this expected? Note that I also casted the original code to :: Function originally, but a similar error was thrown.

Further, the intended usage of the dictionary is to facilitate an expedient way to obtain an instance of the function and then apply parameters in a standard factory pattern to reduce the “get field” overhead and copy an instance to apply parameters to. e.g. from the above:

d = Dict{String,Function}()
v = [i for i in 1:10]
localpercentile =  copy(d["percentile"]) #?

results = localpercentile(v, [10,25,50])

is this possible?

Regards

It appears names() is returning the names of ALL the objects in StatsBase, functions and structures and probably other things. So you should probably do:

for f in functions
   println(typeof(f)) 
   # func = f
   if getfield(StatsBase, f) isa Function
        d[string(f)] = getfield(StatsBase, f) 
    end
end

I’m not 100% on what you are asking about with your second question the ::Function basically tells the compiler you are expecting a function to be returned from the call and if it isn’t to throw an exception.

In your last example I’m not sure why you have the copy, you can do this just fine:

julia> d=Dict{String, Function}()
Dict{String,Function}()

julia> d["percentile"] = getfield(StatsBase, :percentile)
percentile (generic function with 1 method)

julia> d["percentile"]([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [10, 25, 50])
3-element Array{Float64,1}:
 1.9
 3.25
 5.5

Hi,

Many thanks for your reply @pixel27 - when I paste in the code:

using StatsBase

d = Dict{String,Function}()

functions = names(StatsBase, all= false, imported = false)

for f in functions
 
   # func = f
   if getfield(StatsBase, f) isa Function
        d[string(f)] = getfield(StatsBase, f) 
    end
end

I get the following error:

UndefVarError: findat not defined

Stacktrace:
 [1] top-level scope at .\In[44]:11
 [2] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

where line 11 is the “isa” line. Did you get the same? I’m using Julia 1.5.3

The end state for this is create arrays of function, grouped around the common variable (vector) set they all share e.g. (assuming d now populated):

d = Dict{String,Function}()
computeNames1 = FunctionWrangler([d["Mean"], d["percentile"]])
computeNames2 = FunctionWrangler([ d["Mean"]])
computeSets = [computeNames1 , computeNames2] 

vector1 = [i for i in 1:10]
vector2 = [i for i in 10:20]

vectorGrouping = [vector1,vector2]
computeSetIncrement = 1
Threads.@threads for v in vectorGrouping
    result = zeros(Float64, length(computeSets[computeSetIncrement]))
     smap!(result , computeSets[computeSetIncrement], v)
    computeSetIncrement += 1
end

using the FunctionWranglers.jl package to wrap the function array and then execute each group in parallel. Since I wasn’t sure if it is possible to execute functions in parallel in a thread safe way when acting on different datasources (closer to the Actor.jl model) I assumed a reference to the method would be required.

Note that an alternative to the above would be to bring the threads into the loop to execute over the computesets in parallel and and / or the vectorGrouping.

@tisztamo , @pbayer - does this use case look valid for your respective packages? My need is to be able to dynamically group methods and dispatch against common data as above, but I don’t think this is a unique scenario so interested in your thoughts on how to implement.

Regards

On further testing and reviewing the sourcecode, it seems that findat and wmean! are not actually defined in StatsBase but are exported. Omitting them does allow me to populate the dictionary as originally intended - so thanks for the pointer @pixel27 .

I suppose this does potentially mean that there is bug somewhere or a more robust way to do this, but it works as it stands.

It seems conceptually possible, but I fear FunctionWranglers will not give you much edge, even when you run the computiation part several times (without that, as in the example, FunctionWranglers will only slow things down!), because you call the wrangler only once in every task, and task creation itself takes time.

Here:

You forgot to wrap the functions into a FunctionWrangler, e.g.:

computeNames1 =FunctionWrangler([d["Mean"], d["percentile"]])

Also, your use of computeSetIncrement is not valid for distributing the work between threads: possibly all threads will compute on the first vector. The @threads macro already does the distribution, you just need to access the index of v, e.g. with running the for cycle on the indexes.

Thanks for the response @tisztamo - I should have put the disclaimer that this was only toy - box code - when written in full I follow your approach from Github - I’ll edit the example.

The indexer was also to be explicit in defining the approach: keeping compute sets in sync with the required data (but your comment is again valid).

In the theme of this thread: my initial goal was / is to create a factory of all the methods in a module that is callable by (string) name. I did quite a bit of research before posting, but there doesn’t seem to be a defined approach to this even though it seems to be the natural compliment to multiple dispatch. Has anyone come across an agreed upon approach?

Regards

1 Like

Yes, custom single-dispatch mechanisms are common in Julia code, but I was not able to find “the” pattern for now. It may be because single dispatch is often value-based on relatively complex rules (otherwise there is rarely a need to code a dispatch mechanism directly, although it happens, like with your gui). E.g.:

I have the feeling that single dispatch is somewhat disregarded in the Julia community, because multiple dispatch is the killer feature of the language, and one can think that it is a superset of single dispatch. But it is not.

1 Like

as far as I understand your use case, it fits more a parallel loop like @threads for ... as you actually proposed. You can do it with actors, but it is not necessary and less straightforward.