Comparing Unequal Dictionaries

SergeantMike67 · December 18, 2020, 2:35pm

So after reading about everything I can find on the subject, I am going to throw in the towel and ask.

I have 2 dictionaries “a” and “b”
a = 4-element Array{Pair{Union{Missing, Int16},Int64},1}:
133 => 2
135 => 9
137 => 1
143 => 4

b = 6-element Array{Pair{Union{Missing, Int16},Int64},1}:
129 => 10
131 => 9
133 => 8
135 => 14
137 => 2
141 => 1

what I want to do is get the difference between the values in each key

the code I have so far is

for (index,value) in a
for (indexj,valuej) in b
if index==indexj
difference= abs(value-valuej)
end
end
end

However, because the Dictionaries are of unequal size, I cannot get the numbers for 129 and 141 in “b” and 143 in “a” because there isn’t a match in both sets. How do I get these values included in the sum of differences?

Rudi79 · December 18, 2020, 3:10pm

It depends on what you want. What should the difference be if one key is not present in both dicts?
Assuming a default value of 0, the result could look something like

difference = sum(values(mergewith( (x,y) -> abs(x-y), a, b)))

Or a bit more verbose so you can better see what is going on could be:

diff = 0
same = intersect(keys(a), keys(b))
for i in same
    diff += abs(a[i]-b[i])
end
for i in setdiff(keys(a), same)
    diff += a[i]
end
for i in setdiff(keys(b), same)
    diff += b[i]
end

I would also like to suggest reading this post:

SergeantMike67 · December 18, 2020, 3:46pm

oops sorry, should have followed the formatting structure. Will do so in the future

When I try the one line statement, I get this error:

ERROR: MethodError: no method matching mergewith(::var"#21#22", ::Array{Pair{Union{Missing, Int16},Int64},1}, ::Array{Pair{Union{Missing, Int16},Int64},1})
Closest candidates are:
  mergewith(::Any) at abstractdict.jl:350
  mergewith(::Any, ::AbstractDict, ::AbstractDict...) at abstractdict.jl:348
Stacktrace:
 [1] top-level scope at none:1

I am using V1.5.1. I am curious as to why I am getting this error

contradict · December 18, 2020, 7:01pm

Because you aren’t using dictionaries. Your a and b are

a = 4-element Array{Pair{Union{Missing, Int16},Int64},1}:

I’m not sure how you got there, maybe by calling collect on a Dict{Union{Missing, Int16}, Int64}? In any case, try applying Dict() to a and b before mergewith

SergeantMike67 · December 18, 2020, 7:49pm

I did use a collect to pull out the two groups of data from a larger dataframe.

contradict · December 19, 2020, 12:26am

Oh, in that case there might be a way to do what you want while keeping things in a Dataframe. Can you show a more complete example of what you want to accomplish?

SergeantMike67 · December 19, 2020, 5:58pm

What I am attempting to do is determine the sum of squares between 2 populations at one genetic loci. For example: in an organism there at locus A there are several variations on the gene. What I need to do is find the differences between the populations. So basically if there is a way to compare the populations while in the dataframe then I am all for it.

Here is some example data

153×3 DataFrame
│ Row │ locus  │ population │ counts                                                                                    │
│     │ String │ String     │ Dict{Union{Missing, Int16},Int64}                                                         │
├─────┼────────┼────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ 1   │A│         1         │ Dict(133=>2,135=>9,143=>4,137=>1)                                                         │
│ 2   │A│         2         │ Dict(133=>8,135=>14,131=>9,137=>2,129=>10,141=>1)                                         │
│ 3   │A│         3         │ Dict(135=>1,123=>4,145=>2,143=>1,137=>10,139=>1,141=>5)                                   │
│ 4   │A|         4         │ Dict(133=>1,135=>7,147=>2,123=>3,145=>3,143=>1,137=>17,149=>1,139=>3,141=>8)              │
│ 5   │A│         5         │ Dict(135=>7,147=>1,123=>1,145=>2,143=>6,137=>10,139=>1,141=>2)                            │
│ 6   │A│         6         │ Dict(135=>2,121=>1,123=>3,145=>1,137=>4,139=>9,141=>2)                                    │
│ 7   │A│         7         │ Dict(133=>3,135=>7,123=>2,137=>2,149=>2,141=>2)                                           │
│ 8   │A│         8         │ Dict(133=>1,135=>9,123=>1,143=>1,137=>7,141=>1)                                           │
│ 9   │A|         9         │ Dict(135=>8,121=>3,123=>1,137=>6)                                                         │
│ 10  │A│        10         │ Dict(135=>10,123=>2,143=>9,139=>1)                                                        │

Where locus string is the locus name, population string is the population number(may or may not be numeric that depends on what the researcher entered here and counts are the number of individuals that have that version of the gene in that population. The versions do not have to have numeric names they can be alphanumeric.

example: Population 1 at locus A; 2 individuals have version 133, 9 individuals have version 135, 4 individuals have version 143, and 1 individual has version 137

contradict · December 19, 2020, 10:29pm

OK, I think I was wrong, I don’t see a nice way to fit that into a dataframe. Your collect was probably on the right track. How about constructing a matrix of all pairwise comparisons?

compare(a, b) = sum(values(mergewith( (x,y) -> abs(x-y), a, b)))
counts = df.counts
differences = compare.(permutedims(counts), counts)

Topic		Replies	Views
Comparing Dictionaries of Dictionaries General Usage question , dictionaries	16	460	February 17, 2024
Julian way of comparing two dictionaries General Usage question , dictionary , julian-way	8	3606	July 31, 2018
Merging dictionaries ensuring they are disjoint General Usage question	2	152	June 19, 2024
Lacking keys/values in a dictionary General Usage	2	269	May 26, 2021
Create a dictionary from Dictionary and Array New to Julia	6	1231	November 15, 2021

Comparing Unequal Dictionaries

Related topics