Collect doesn't terminate

hi i’m trying to normalize(in the db sense) a dictionary{Int8, Vecctor{Float}} but the code never ends

d = Dict()
for i in 1:100
    c = rand(Int8)
    n = rand()
    a = get!(d, c, [])
    push!(a, n)
end
map(collect, zip(Iterators.flatten([[(k, i) for i in v] for (k, v) in d])...))

i want to convert something like 1 => [1,2,3], 2=>[3] to something like [(1, 1), (1,2), (1,3), (2,3)] and then [1,1,1,2], [1,2,3,3]

You are passing 100 collections to the zip function, so this is the bottleneck of the code.

You can use this function to accomplish what you want:

function fill_key_val(dict)
	n = sum(length, values(dict))
	key_vector = Array{keytype(dict)}(undef, n)
	val_vector = Array{eltype(valtype(dict))}(undef, n)
	ind = 1
	@inbounds for (k, v) = dict
		for i = v
			key_vector[ind] = k
			val_vector[ind] = i
			ind += 1
		end
	end
	key_vector, val_vector
end

Note that you can specify concrete types for the keys and values of the dictionary by Dict{Int8, Vector{Float64}}(), so the compiler can generate specialized code.

it doesn’t terminate, as in i let it run for 8 hours and didn’t finish, zip isn’t the bottleneck since

collect(zip(Iterators.flatten([[(k, i) for i in v] for (k, v) in d])...))

runs pretty fast

map(collect, collect(zip(Iterators.flatten([[(k, i) for i in v] for (k, v) in d])...)))

runs fast too

Actually, what I meant to say is that the bottleneck is the argument of the zip function itself. You are creating a Tuple with 100 elements, so the compiler struggles with type inference. This problem occurs specifically when you try to iterate the following Generator:

iter = (collect(k) for k in zip(Iterators.flatten([[(k, i) for i in v] for (k, v) in d])...))
iterate(iter)
1 Like

it struggles even with 5 arguments, why does adding a collect to zip solve the problem?

Are you sure? If you replace for i in 1:100 in your original code by for i in 1:5, it should run in acceptable time. Of course it is not optimized, but the code should run.

map dispatches on a specialized method for the case of an AbstractArray, which is the result of collect.

1 Like

map dispatches on a specialized method for the case of an AbstractArray, which is the result of collect.
makes sense

you’re right it runs fine with less than 30 items.
thanks