Comparing Unequal Dictionaries

So after reading about everything I can find on the subject, I am going to throw in the towel and ask.

I have 2 dictionaries “a” and “b”
a = 4-element Array{Pair{Union{Missing, Int16},Int64},1}:
133 => 2
135 => 9
137 => 1
143 => 4

b = 6-element Array{Pair{Union{Missing, Int16},Int64},1}:
129 => 10
131 => 9
133 => 8
135 => 14
137 => 2
141 => 1

what I want to do is get the difference between the values in each key

the code I have so far is

for (index,value) in a
for (indexj,valuej) in b
if index==indexj
difference= abs(value-valuej)
end
end
end

However, because the Dictionaries are of unequal size, I cannot get the numbers for 129 and 141 in “b” and 143 in “a” because there isn’t a match in both sets. How do I get these values included in the sum of differences?

It depends on what you want. What should the difference be if one key is not present in both dicts?
Assuming a default value of 0, the result could look something like

difference = sum(values(mergewith( (x,y) -> abs(x-y), a, b)))

Or a bit more verbose so you can better see what is going on could be:

diff = 0
same = intersect(keys(a), keys(b))
for i in same
    diff += abs(a[i]-b[i])
end
for i in setdiff(keys(a), same)
    diff += a[i]
end
for i in setdiff(keys(b), same)
    diff += b[i]
end

I would also like to suggest reading this post:

2 Likes

oops sorry, should have followed the formatting structure. Will do so in the future

When I try the one line statement, I get this error:

ERROR: MethodError: no method matching mergewith(::var"#21#22", ::Array{Pair{Union{Missing, Int16},Int64},1}, ::Array{Pair{Union{Missing, Int16},Int64},1})
Closest candidates are:
  mergewith(::Any) at abstractdict.jl:350
  mergewith(::Any, ::AbstractDict, ::AbstractDict...) at abstractdict.jl:348
Stacktrace:
 [1] top-level scope at none:1

I am using V1.5.1. I am curious as to why I am getting this error

Because you aren’t using dictionaries. Your a and b are

a = 4-element Array{Pair{Union{Missing, Int16},Int64},1}:

I’m not sure how you got there, maybe by calling collect on a Dict{Union{Missing, Int16}, Int64}? In any case, try applying Dict() to a and b before mergewith

I did use a collect to pull out the two groups of data from a larger dataframe.

Oh, in that case there might be a way to do what you want while keeping things in a Dataframe. Can you show a more complete example of what you want to accomplish?

What I am attempting to do is determine the sum of squares between 2 populations at one genetic loci. For example: in an organism there at locus A there are several variations on the gene. What I need to do is find the differences between the populations. So basically if there is a way to compare the populations while in the dataframe then I am all for it.

Here is some example data

153×3 DataFrame
│ Row │ locus  │ population │ counts                                                                                    │
│     │ String │ String     │ Dict{Union{Missing, Int16},Int64}                                                         │
├─────┼────────┼────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ 1   │A│         1         │ Dict(133=>2,135=>9,143=>4,137=>1)                                                         │
│ 2   │A│         2         │ Dict(133=>8,135=>14,131=>9,137=>2,129=>10,141=>1)                                         │
│ 3   │A│         3         │ Dict(135=>1,123=>4,145=>2,143=>1,137=>10,139=>1,141=>5)                                   │
│ 4   │A|         4         │ Dict(133=>1,135=>7,147=>2,123=>3,145=>3,143=>1,137=>17,149=>1,139=>3,141=>8)              │
│ 5   │A│         5         │ Dict(135=>7,147=>1,123=>1,145=>2,143=>6,137=>10,139=>1,141=>2)                            │
│ 6   │A│         6         │ Dict(135=>2,121=>1,123=>3,145=>1,137=>4,139=>9,141=>2)                                    │
│ 7   │A│         7         │ Dict(133=>3,135=>7,123=>2,137=>2,149=>2,141=>2)                                           │
│ 8   │A│         8         │ Dict(133=>1,135=>9,123=>1,143=>1,137=>7,141=>1)                                           │
│ 9   │A|         9         │ Dict(135=>8,121=>3,123=>1,137=>6)                                                         │
│ 10  │A│        10         │ Dict(135=>10,123=>2,143=>9,139=>1)                                                        │

Where locus string is the locus name, population string is the population number(may or may not be numeric that depends on what the researcher entered here and counts are the number of individuals that have that version of the gene in that population. The versions do not have to have numeric names they can be alphanumeric.

example: Population 1 at locus A; 2 individuals have version 133, 9 individuals have version 135, 4 individuals have version 143, and 1 individual has version 137

OK, I think I was wrong, I don’t see a nice way to fit that into a dataframe. Your collect was probably on the right track. How about constructing a matrix of all pairwise comparisons?

compare(a, b) = sum(values(mergewith( (x,y) -> abs(x-y), a, b)))
counts = df.counts
differences = compare.(permutedims(counts), counts)