Collect doesn't terminate

sobhan · July 30, 2020, 1:50am

hi i’m trying to normalize(in the db sense) a dictionary{Int8, Vecctor{Float}} but the code never ends

d = Dict()
for i in 1:100
    c = rand(Int8)
    n = rand()
    a = get!(d, c, [])
    push!(a, n)
end
map(collect, zip(Iterators.flatten([[(k, i) for i in v] for (k, v) in d])...))

i want to convert something like 1 => [1,2,3], 2=>[3] to something like [(1, 1), (1,2), (1,3), (2,3)] and then [1,1,1,2], [1,2,3,3]

lucas711642 · July 30, 2020, 2:25am

You are passing 100 collections to the zip function, so this is the bottleneck of the code.

You can use this function to accomplish what you want:

function fill_key_val(dict)
	n = sum(length, values(dict))
	key_vector = Array{keytype(dict)}(undef, n)
	val_vector = Array{eltype(valtype(dict))}(undef, n)
	ind = 1
	@inbounds for (k, v) = dict
		for i = v
			key_vector[ind] = k
			val_vector[ind] = i
			ind += 1
		end
	end
	key_vector, val_vector
end

Note that you can specify concrete types for the keys and values of the dictionary by Dict{Int8, Vector{Float64}}(), so the compiler can generate specialized code.

sobhan · July 30, 2020, 4:26pm

it doesn’t terminate, as in i let it run for 8 hours and didn’t finish, zip isn’t the bottleneck since

collect(zip(Iterators.flatten([[(k, i) for i in v] for (k, v) in d])...))

runs pretty fast

sobhan · July 30, 2020, 4:40pm

map(collect, collect(zip(Iterators.flatten([[(k, i) for i in v] for (k, v) in d])...)))

runs fast too

lucas711642 · July 30, 2020, 11:20pm

Actually, what I meant to say is that the bottleneck is the argument of the zip function itself. You are creating a Tuple with 100 elements, so the compiler struggles with type inference. This problem occurs specifically when you try to iterate the following Generator:

iter = (collect(k) for k in zip(Iterators.flatten([[(k, i) for i in v] for (k, v) in d])...))
iterate(iter)

sobhan · July 31, 2020, 3:16am

it struggles even with 5 arguments, why does adding a collect to zip solve the problem?

lucas711642 · July 31, 2020, 4:20pm

Are you sure? If you replace for i in 1:100 in your original code by for i in 1:5, it should run in acceptable time. Of course it is not optimized, but the code should run.

map dispatches on a specialized method for the case of an AbstractArray, which is the result of collect.

sobhan · July 31, 2020, 5:11pm

map dispatches on a specialized method for the case of an AbstractArray, which is the result of collect.
makes sense

you’re right it runs fine with less than 30 items.
thanks

Topic		Replies	Views
Terminate in collect General Usage question	1	466	April 3, 2017
Collecting zip New to Julia	1	5388	February 13, 2019
Length of Dict is not thread safe General Usage multithreading	5	680	January 20, 2022
Collect values in a dict New to Julia	10	590	July 16, 2021
How to add several values for one index in a comprehension? New to Julia	3	298	November 14, 2023

Collect doesn't terminate

Related topics