How do I append/add data in dictionary with same key?

I am trying create a Dictionary from array

data = ["COMB1"   1.4;
        "COMB2"   1.2;
        "COMB2"   1.6;
        "COMB3"   1.2;
        "COMB3"   1.4;
        "COMB3"   1.3;
        "COMB3"   1.5];

I tried to do that by the following command

dict = Dict( data[i,1] => data[i,2] for i=1:size(data,1))

the output is:

Dict String → Float64 with 3 entries
"COMB1" → 1.40
"COMB2" → 1.60
"COMB3" → 1.50

I was expecting is something like:

Dict("COMB1" => 1.4,
     "COMB2" => [1.2,1.6],
     "COMB3" => [1.2,1.4,1.3,1.5]);


Dict String → Any with 3 entries
"COMB1" → 1.40
"COMB2" → Float64[2]
"COMB3" → Float64[4]

A dictionary is a mapping from a key to a single value (the value could be an array). If you set an existing key to a new value, the previous value will be overwritten.

So create a dictionary where each value is a vector, and push! values to that vector instead.

2 Likes

I’m not sure if the context, but this is just the sort of aggregation operation that dataframes are designed for.

One option is to do

using DataFrames

df = DataFrame(dict)
dict = Dict{String,Vector{Float64}}()
by(sdf -> (dict[sdf[1,:x1]] = sdf.x2), df, :x1)

which is either overly elaborate or perfectly reasonable depending on the context.

I’ll just mention that the solution proposed by Kristoffer Carlsson is, as far as I know, the solution everyone uses, but it has a performance hit under the hood because you need to write code like:

  if !haskey(d, k)
     d[k] = [v]
  else
     push!(d[k],v)
  end

so in other words, the same key needs to be looked up multiple times. It would be better to somehow remember the position in the dictionary. This is the subject of an open issue that has not been resolved:

https://github.com/JuliaLang/julia/issues/24454

I use either get! or a DefaultDict for this.
Either way it avoids the double lookup @Stephen_Vavasis was unhappy about.

get! tries to find an element and if one is not found calls the function insterting that as the element

dict = Dict{String, Vector{Float64}}()
for i in 1:size(data,1)
	(kk, vv) = data[i,:]
	vals = get!(Vector{Float64}, dict, kk)
	push!(vals, vv)
end

Output

julia> @show dict
dict = Dict("COMB1"=>[1.4],"COMB2"=>[1.2, 1.6],"COMB3"=>[1.2, 1.4, 1.3, 1.5])
Dict{String,Array{Float64,1}} with 3 entries:
  "COMB1" => [1.4]
  "COMB2" => [1.2, 1.6]
  "COMB3" => [1.2, 1.4, 1.3, 1.5]

DataStructures defaultdict

dict = DefaultDict{String,Vector{Float64}}(Vector{Float64})

for i in 1:size(data,1)
	(kk, vv) = data[i,:]
	push!(dict[kk], vv)
end

output

julia> @show dict
dict = DefaultDict("COMB1"=>[1.4],"COMB2"=>[1.2, 1.6],"COMB3"=>[1.2, 1.4, 1.3, 1.5])
DefaultDict{String,Array{Float64,1},DataType} with 3 entries:
  "COMB1" => [1.4]
  "COMB2" => [1.2, 1.6]
  "COMB3" => [1.2, 1.4, 1.3, 1.5]
2 Likes

You are right – this solves the OP’s problem. When the ‘value’ is mutable and needs mutating, your post shows how to avoid the double-lookup. So the open issue I referred to above applies only when the value is of an immutable type (or a mutable type that needs to overwritten rather than mutated).

2 Likes