Usage of multiple dispatch in Base (wikipedia entry)


#1

I happened to stumble across Wikipedia’s page on Multiple dispatch and noted that (at the time) Julia’s only mention was a single link, and all serious discussion was focused on other languages (many of which do not have native support for MD).

If you follow that link, you’ll see I fixed that :smile:. In particular, folks may be interested in the graphs I added analyzing the usage of multiple dispatch in practice. For archival purposes, here’s my analysis code:

# Pairwise comparison of argument types in method signatures
struct ArgSharingCounts
    # counts by argument position of same/different
    # argsame[n][s] or argdiff[n][s]: methods with n arguments, comparing slot s
    argsame::Vector{Vector{Int}}
    argdiff::Vector{Vector{Int}}
    # histogram of total numbers of identical arguments
    # histsame[n][k]: number of method-pairs with n arguments of which k-1 are the same
    histsame::Vector{Vector{Int}}
end
ArgSharingCounts() = ArgSharingCounts(Vector{Vector{Int}}(0), Vector{Vector{Int}}(0), Vector{Vector{Int}}(0))

Base.length(asc::ArgSharingCounts) = length(asc.histsame)

function grow_to!(asc::ArgSharingCounts, n)
    while (m = length(asc)) < n
        push!(asc.argsame, zeros(Int, m+1))
        push!(asc.argdiff, zeros(Int, m+1))
        push!(asc.histsame, zeros(Int, m+2)) # histogram has extra slot for k=0 case
    end
    asc
end

sig(m::Method) = sig(m.sig)
sig(s) = s
sig(s::UnionAll) = Base.unwrap_unionall(s)

function Base.push!(asc::ArgSharingCounts, m1::Method, m2::Method)
    p1, p2 = sig(m1).parameters, sig(m2).parameters
    (l = length(p1)) == length(p2) || return asc
    grow_to!(asc, l-1)  # first arg is typeof(f), skip it
    k = 0
    for i = 1:l-1
        if p1[i+1] == p2[i+1]
            asc.argsame[l-1][i] += 1
            k += 1
        else
            asc.argdiff[l-1][i] += 1
        end
    end
    asc.histsame[l-1][k+1] += 1
    asc
end

funcdict = Dict{Symbol,ArgSharingCounts}()
for name in names(Base)
    @show name
    f = getfield(Base, name)
    if isa(f, Function)
        mths = collect(methods(f))
        asc = ArgSharingCounts()
        for j = 1:length(mths)
            mj = mths[j]
            for i = 1:j-1
                mi = mths[i]
                push!(asc, mi, mj)
            end
        end
        funcdict[name] = asc
    end
end

# Total counts
asct = ArgSharingCounts()
for (fsym, asc) in funcdict
    grow_to!(asct, length(asc))
    for i = 1:length(asc)
        asct.argsame[i] .+= asc.argsame[i]
        asct.argdiff[i] .+= asc.argdiff[i]
        asct.histsame[i] .+= asc.histsame[i]
    end
end

# normalized analysis of 2-argument functions
nmethods = Int[]
p0 = Float64[]
p1 = Float64[]
for (fsym, asc) in funcdict
    if length(asc) >= 2
        push!(nmethods, length(collect(Iterators.filter(m->length(sig(m).parameters)==3, methods(getfield(Base, fsym))))))
        nt = sum(asc.histsame[2])
        if nt > 0
            push!(p0, asc.histsame[2][1]/nt)
            push!(p1, asc.histsame[2][2]/nt)
        end
    end
end

using Gadfly
phist = plot(x=nmethods, Guide.xlabel("with # methods"), Guide.ylabel("# 2-arg funcs"), Scale.x_log10, Geom.histogram);
pshared = plot(x=p0, Guide.xlabel("Frac. pairs, 0 shared arguments"), Guide.ylabel("Count"), Geom.histogram, Coord.Cartesian(xmin=0.0, xmax=1.0));
vstack(phist, pshared)

#2

Nice addition.

As much as I love CL, as Julia matures and grows in popularity, it may make sense to extend (or, heaven forbid, replace) the CL examples with Julia. But that’s now something I would push for at the moment.


#3

Honestly, it’s pretty questionable to add that data to Wikipedia, as it probably falls under the no original research ban unless you publish it elsewhere.


#4

Does it count if published here? It’s definitely not worth a paper. But the world is a worse place without it.


#5

Highly questionable. Put it this way: if you were writing an academic paper, would you unquestioningly cite data that was published only by some “random person” in an online discussion board somewhere?

If it were in an official Julia blog post, however, that could arguably pass the bar. (A blog post takes more work than dashing off a script on discourse, but less work than an academic paper at least.)


#6

Didn’t remember that policy, but the whole reason I posted the source code here was to ensure reproducibility and verifiability. Skimming the policy, to me it looks like this is a valid use. (EDIT: my reply here was to your first post, we had a race condition in our replies.)


#7

If wikipedia gives you a hard time about the policy you could always put it on wikiversity and maybe put a link to the wikiversity page on the wikipedia page :stuck_out_tongue_closed_eyes:


#8

Self-published primary sources are supposed to be used with extreme caution in Wikipedia. Wikipedia articles are only supposed to cite reputable sources.

(Full disclosure: I’ve been a Wikipedia editor since 2003, and actually have admin privileges there, although I’m not as active these days. The “no original research” ban and “reliable sources” policy are extremely important to Wikipedia, e.g. it is the first line of defense against science cranks.)


#9

It seems better to replace it with the data on “quantifying multiple dispatch” that was published in the SIAM Review article (arxiv: https://arxiv.org/abs/1411.1607). This is especially nice because it compares multiple languages, using data from another paper.


#10

Is it really the same if it’s a code to reproduce the data yourself? No one has to trust @tim.holy here, they can just run the script. The true primary source here then is the Base installation of Julia itself, and what’s shown in this thread is just how one can query that data.


#11

Yes. (You could make the same argument for novel mathematical arguments you could “check yourself”, or computer cold-fusion simulations “you could run yourself”.)

The basic epistemological issue here is that Wikipedia editors are not in a position to check the validity of the code or statistical methodology of anyone. There are no arguments about validity of code or math on Wikipedia (or at least, there aren’t supposed to be), only arguments about whether a source is reputable/notable.

A random code snippet posted to a forum somewhere is not a reputable source of statistical methodology.


#12

Thanks for stepping up and making the edits!