What I am attempting to do is determine the sum of squares between 2 populations at one genetic loci. For example: in an organism there at locus A there are several variations on the gene. What I need to do is find the differences between the populations. So basically if there is a way to compare the populations while in the dataframe then I am all for it.
Here is some example data
153×3 DataFrame
│ Row │ locus │ population │ counts │
│ │ String │ String │ Dict{Union{Missing, Int16},Int64} │
├─────┼────────┼────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ 1 │A│ 1 │ Dict(133=>2,135=>9,143=>4,137=>1) │
│ 2 │A│ 2 │ Dict(133=>8,135=>14,131=>9,137=>2,129=>10,141=>1) │
│ 3 │A│ 3 │ Dict(135=>1,123=>4,145=>2,143=>1,137=>10,139=>1,141=>5) │
│ 4 │A| 4 │ Dict(133=>1,135=>7,147=>2,123=>3,145=>3,143=>1,137=>17,149=>1,139=>3,141=>8) │
│ 5 │A│ 5 │ Dict(135=>7,147=>1,123=>1,145=>2,143=>6,137=>10,139=>1,141=>2) │
│ 6 │A│ 6 │ Dict(135=>2,121=>1,123=>3,145=>1,137=>4,139=>9,141=>2) │
│ 7 │A│ 7 │ Dict(133=>3,135=>7,123=>2,137=>2,149=>2,141=>2) │
│ 8 │A│ 8 │ Dict(133=>1,135=>9,123=>1,143=>1,137=>7,141=>1) │
│ 9 │A| 9 │ Dict(135=>8,121=>3,123=>1,137=>6) │
│ 10 │A│ 10 │ Dict(135=>10,123=>2,143=>9,139=>1) │
Where locus string
is the locus name, population string
is the population number(may or may not be numeric that depends on what the researcher entered here and counts
are the number of individuals that have that version of the gene in that population. The versions do not have to have numeric names they can be alphanumeric.
example: Population 1 at locus A; 2 individuals have version 133, 9 individuals have version 135, 4 individuals have version 143, and 1 individual has version 137