Best way of building a heavily nested Dict

I want to create a nested Dict from a list of data. I think it’s clear from this non-working example what I want to do:

# make dummy data
n = 1000
r = [rand(string.('A':'Z'), n, 5) rand(n,1)]

# build a nested Dict
nested = Dict()   # or Dict(Dict(Dict(...))) or whatever
for i=1:n
    a,b,c,d,e = r[i,1:5]
    nested[a][b][c][d][e] = r[i,6]
end

What is the most compact and elegant way of building this structure without creating key combinations not in the list, and without descending too far into haskey() hell? Extra brownie points if the Dicts are correctly typed instead of using Any.

It rather depends on how you want your data in memory. For example, you could flatten this all into one Dict, by using (a, b, c, d, e) as the key, but then you lose the ability to enumerate or count the subsets easily.

For building this, without lots of haskey calls, you can use a mutating get call:

get!(..., get!(Dict{...}, get!(Dict{Dict{...}}, get!(Dict{Dict{Dict{...}}},
    nested, a), b), c), d)[e] = r[i, 6]

(although you might want to use simply Dict{String, Any} for each of those constructors, but ymmv)

IndexedTables is another option.

1 Like

Yes, in particular NDSparse.

Thanks! Let me just fill in the dots in your proposal in case anyone else is wondering:

nested = Dict()
for i=1:n
    a,b,c,d,e = r[i,1:5]
    get!(Dict, get!(Dict, get!(Dict, get!(Dict, nested, a), b), c), d)[e] = r[i, 6]
end

That will make nested dicts of type Dict{Any,Any}. To get more specific types, i.e. Dict{String, Dict{String, ...}} with an inner Float64, something like this would work:

dict(n) = n == 0 ? Float64 : Dict{String, dict(n-1)}
nested = dict(5)()
for i=1:n
    a,b,c,d,e = r[i,1:5]
    get!(dict(1), get!(dict(2), get!(dict(3), get!(dict(4), nested, a), b), c), d)[e] = r[i, 6]
end

I’m sure it’s possible to make the get!() calls recursive as well, but I don’t have time to work out the details right now.

I was shooting for a nested Dict implementation since I’m going to have the same datastructure in Julia and Javascript, and Javascript can’t do anything fancier than nested objects. So sadly I think flattened Dicts or IndexedTables are off the table. Thanks for the suggestions though.