Hi.
I needed to implement a topological sort, so I found some implementations on the Internet here: https://rosettacode.org/wiki/Topological_sort
I initially needed it in Python and experimented with Julia just out of curiosity. Here is a Python code:
try:
from functools import reduce
except:
pass
data = {
'des_system_lib': set('std synopsys std_cell_lib des_system_lib dw02 dw01 ramlib ieee'.split()),
'dw01': set('ieee dw01 dware gtech'.split()),
'dw02': set('ieee dw02 dware'.split()),
'dw03': set('std synopsys dware dw03 dw02 dw01 ieee gtech'.split()),
'dw04': set('dw04 ieee dw01 dware gtech'.split()),
'dw05': set('dw05 ieee dware'.split()),
'dw06': set('dw06 ieee dware'.split()),
'dw07': set('ieee dware'.split()),
'dware': set('ieee dware'.split()),
'gtech': set('ieee gtech'.split()),
'ramlib': set('std ieee'.split()),
'std_cell_lib': set('ieee std_cell_lib'.split()),
'synopsys': set(),
}
def toposort2(data):
for k, v in data.items():
v.discard(k) # Ignore self dependencies
extra_items_in_deps = reduce(set.union, data.values()) - set(data.keys())
data.update({item:set() for item in extra_items_in_deps})
while True:
ordered = set(item for item,dep in data.items() if not dep)
if not ordered:
break
for el in sorted(ordered):
yield el
data = {item: (dep - ordered) for item,dep in data.items()
if item not in ordered}
assert not data, "A cyclic dependency exists amongst %r" % data
and Iāve got the performance of:
%timeit list(toposort2(data))
10000 loops, best of 3: 20.2 Āµs per loop
Here is a Julia translation from the site:
function toposort2(data::Dict{T,Set{T}}) where T
for (k, v) in data
delete!(v, k)
end
extraitems = setdiff(reduce(āŖ, values(data)), keys(data))
for item in extraitems
data[item] = Set{T}()
end
rst = Vector{T}()
while true
ordered = Set(item for (item, dep) in data if isempty(dep))
if isempty(ordered) break end
append!(rst, ordered)
data = Dict{T,Set{T}}(item => setdiff(dep, ordered) for (item, dep) in data if item ā ordered)
end
@assert isempty(data) "a cyclic dependency exists amongst $(keys(data))"
return rst
end
And this is the benchmark Iāve got:
data = Dict{String,Set{String}}(
"des_system_lib" => Set(split("std synopsys std_cell_lib des_system_lib dw02 dw01 ramlib ieee")),
"dw01" => Set(split("ieee dw01 dware gtech")),
"dw02" => Set(split("ieee dw02 dware")),
"dw03" => Set(split("std synopsys dware dw03 dw02 dw01 ieee gtech")),
"dw04" => Set(split("dw04 ieee dw01 dware gtech")),
"dw05" => Set(split("dw05 ieee dware")),
"dw06" => Set(split("dw06 ieee dware")),
"dw07" => Set(split("ieee dware")),
"dware" => Set(split("ieee dware")),
"gtech" => Set(split("ieee gtech")),
"ramlib" => Set(split("std ieee")),
"std_cell_lib" => Set(split("ieee std_cell_lib")),
"synopsys" => Set("")
)
using BenchmarkTools
@benchmark toposort2(data)
BenchmarkTools.Trial:
memory estimate: 31.05 KiB
allocs estimate: 447
--------------
minimum time: 15.935 Ī¼s (0.00% GC)
median time: 16.964 Ī¼s (0.00% GC)
mean time: 24.255 Ī¼s (26.77% GC)
maximum time: 35.626 ms (99.92% GC)
--------------
samples: 10000
evals/sample: 1
Less then 25% improvement on a purely algorithmic task, where Julis should have a significant margin over Python as advertized. Not that impressive at all, and Julia version is not even lazy comparing to Python code.
I have modified Julia version to use channels for lazy generation to recreate Python version more precicely:
function toposort3(data::Dict{T,Set{T}}) where T
for (k, v) in data
delete!(v, k)
end
extraitems = setdiff(reduce(āŖ, values(data)), keys(data))
for item in extraitems
data[item] = Set{T}()
end
rst = Vector{T}()
Channel(ctype=T) do c
while true
ordered = Set(item for (item, dep) in data if isempty(dep))
if isempty(ordered) break end
for el in ordered push!(c, el) end
data = Dict{T,Set{T}}(item => setdiff(dep, ordered) for (item, dep) in data if item ā ordered)
end
@assert isempty(data) "a cyclic dependency exists amongst $(keys(data))"
end
end
Now the performance is:
@benchmark collect(toposort3(data))
BenchmarkTools.Trial:
memory estimate: 42.02 KiB
allocs estimate: 603
--------------
minimum time: 68.883 Ī¼s (0.00% GC)
median time: 72.482 Ī¼s (0.00% GC)
mean time: 87.621 Ī¼s (9.52% GC)
maximum time: 35.695 ms (99.61% GC)
--------------
samples: 10000
evals/sample: 1
Either Python is not as slow as I was led to believe, or Julia implementation is missing something important here. I understand I am doing this just out of curiosity, so donāt want to waste your time with these code listings. I understand the performance isnāt all about Julia, still, it may be hard to convince someone to learn a new language if the outcome isnāt really that apparent.
Thank you for your comments!