# Comparing Unequal Dictionaries

So after reading about everything I can find on the subject, I am going to throw in the towel and ask.

I have 2 dictionaries “a” and “b”
a = 4-element Array{Pair{Union{Missing, Int16},Int64},1}:
133 => 2
135 => 9
137 => 1
143 => 4

b = 6-element Array{Pair{Union{Missing, Int16},Int64},1}:
129 => 10
131 => 9
133 => 8
135 => 14
137 => 2
141 => 1

what I want to do is get the difference between the values in each key

the code I have so far is

for (index,value) in a
for (indexj,valuej) in b
if index==indexj
difference= abs(value-valuej)
end
end
end

However, because the Dictionaries are of unequal size, I cannot get the numbers for 129 and 141 in “b” and 143 in “a” because there isn’t a match in both sets. How do I get these values included in the sum of differences?

It depends on what you want. What should the difference be if one key is not present in both dicts?
Assuming a default value of 0, the result could look something like

``````difference = sum(values(mergewith( (x,y) -> abs(x-y), a, b)))
``````

Or a bit more verbose so you can better see what is going on could be:

``````diff = 0
same = intersect(keys(a), keys(b))
for i in same
diff += abs(a[i]-b[i])
end
for i in setdiff(keys(a), same)
diff += a[i]
end
for i in setdiff(keys(b), same)
diff += b[i]
end
``````

I would also like to suggest reading this post:

2 Likes

oops sorry, should have followed the formatting structure. Will do so in the future

When I try the one line statement, I get this error:

``````ERROR: MethodError: no method matching mergewith(::var"#21#22", ::Array{Pair{Union{Missing, Int16},Int64},1}, ::Array{Pair{Union{Missing, Int16},Int64},1})
Closest candidates are:
mergewith(::Any) at abstractdict.jl:350
mergewith(::Any, ::AbstractDict, ::AbstractDict...) at abstractdict.jl:348
Stacktrace:
[1] top-level scope at none:1
``````

I am using V1.5.1. I am curious as to why I am getting this error

Because you aren’t using dictionaries. Your `a` and `b` are

``````a = 4-element Array{Pair{Union{Missing, Int16},Int64},1}:
``````

I’m not sure how you got there, maybe by calling `collect` on a `Dict{Union{Missing, Int16}, Int64}`? In any case, try applying `Dict()` to `a` and `b` before `mergewith`

I did use a `collect` to pull out the two groups of data from a larger dataframe.

Oh, in that case there might be a way to do what you want while keeping things in a Dataframe. Can you show a more complete example of what you want to accomplish?

What I am attempting to do is determine the sum of squares between 2 populations at one genetic loci. For example: in an organism there at locus A there are several variations on the gene. What I need to do is find the differences between the populations. So basically if there is a way to compare the populations while in the dataframe then I am all for it.

Here is some example data

``````153×3 DataFrame
│ Row │ locus  │ population │ counts                                                                                    │
│     │ String │ String     │ Dict{Union{Missing, Int16},Int64}                                                         │
├─────┼────────┼────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ 1   │A│         1         │ Dict(133=>2,135=>9,143=>4,137=>1)                                                         │
│ 2   │A│         2         │ Dict(133=>8,135=>14,131=>9,137=>2,129=>10,141=>1)                                         │
│ 3   │A│         3         │ Dict(135=>1,123=>4,145=>2,143=>1,137=>10,139=>1,141=>5)                                   │
│ 4   │A|         4         │ Dict(133=>1,135=>7,147=>2,123=>3,145=>3,143=>1,137=>17,149=>1,139=>3,141=>8)              │
│ 5   │A│         5         │ Dict(135=>7,147=>1,123=>1,145=>2,143=>6,137=>10,139=>1,141=>2)                            │
│ 6   │A│         6         │ Dict(135=>2,121=>1,123=>3,145=>1,137=>4,139=>9,141=>2)                                    │
│ 7   │A│         7         │ Dict(133=>3,135=>7,123=>2,137=>2,149=>2,141=>2)                                           │
│ 8   │A│         8         │ Dict(133=>1,135=>9,123=>1,143=>1,137=>7,141=>1)                                           │
│ 9   │A|         9         │ Dict(135=>8,121=>3,123=>1,137=>6)                                                         │
│ 10  │A│        10         │ Dict(135=>10,123=>2,143=>9,139=>1)                                                        │
``````

Where` locus string` is the locus name, `population string` is the population number(may or may not be numeric that depends on what the researcher entered here and `counts` are the number of individuals that have that version of the gene in that population. The versions do not have to have numeric names they can be alphanumeric.

example: Population 1 at locus A; 2 individuals have version 133, 9 individuals have version 135, 4 individuals have version 143, and 1 individual has version 137

OK, I think I was wrong, I don’t see a nice way to fit that into a dataframe. Your collect was probably on the right track. How about constructing a matrix of all pairwise comparisons?

``````compare(a, b) = sum(values(mergewith( (x,y) -> abs(x-y), a, b)))
counts = df.counts
differences = compare.(permutedims(counts), counts)
``````