[ANN] Tally.jl: Counting things for you

Hi all,

I am excited to announce that I have just registered Tally.jl, which is a small package to do tallies, which are also known as frequency counts or bar charts (without the chart). It will tally any iterable object you throw at it:

julia> T = tally(["x", "x", "y", "x"])
Tally with 4 items in 2 groups:
"x" | 3 | 75%
"y" | 1 | 25%

julia> T = tally(rand(-1:1, 10, 10)) # a random 10x10 matrix with entries in [-1, 0, 1]
Tally with 100 items in 3 groups:
-1 | 37 | 37%
0  | 36 | 36%
1  | 27 | 27%

There is also some basic plotting functionality for the REPL (UnicodePlots.jl is amazing):

julia> T = tally(rand(-1:1, 10, 10));

julia> Tally.plot(T)
      ┌                                        ┐
   1  ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 38   38%
   0  ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 34       34%
   -1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■ 28            28%
      └                                        ┘

(if this is mangled due to non-monospace font, have a look at the README)

But one can also feed it into a Plots.bar plot.

One (for my applications) interesting functionality is the ability to count with respect to an arbitrary equivalence relation. In a nutshell: Instead of considering elements x, y equal when ==(x, y) is true, elements are considered equal when equivalence(by(x), by(y)) is true, where by and equivalence are functions provided by the user.

julia> v = 1:100000;

julia> tally(v, by = x -> mod(x, 3))
Tally with 100000 items in 3 groups:
[1] | 33334 | 33.334%
[3] | 33333 | 33.333%
[2] | 33333 | 33.333%

julia> v = ["abb", "ba", "aa", "ba", "bbba", "aaab"];

julia> tally(v, equivalence = (x, y) -> first(x) == first(y) && last(x) == last(y))
Tally with 6 items in 3 groups:
[ba]  | 3 | 50%
[abb] | 2 | 33%
[aa]  | 1 | 17%

(Most of these could be reduced to a tally(map(..., v)), but there are applications where this is not possible. These are quite frequent when doing algebra/number theory on a computer, but are too involved to reproduce here.)

There is no dedicated documentation, since everything fits neatly in the README.

I have written similar functionality in the past quite often, so I thought I might just turn it into a small package, in the hope that someone else might find it also useful. I am pretty sure that similar functionality exists somewhere else, probably in some statistics package. But I don’t speak statistics and all I want is just to tally.

P.S.: I forgot to mention one more thing. To impress people you can also do an animation using a “lazy” tally:

tally_run

34 Likes

I couldn’t help but chuckle at this.

The package looks really well-made and honestly kinda fun!

11 Likes

These are my reservations so far:

  1. Tally is the name of the module, so this doesn’t work: t=Tally(); t isa Tally. It would be preferable (that is to say, idiomatic Julia) for the module name to be Tallies so that Tally can be exposed as the type and constructor, and Tally() would default to constructing a Tally{Any}.
  2. Base.getindex(t::Tally, k) isn’t implemented, although I’m sure this should be easy enough to do.
8 Likes

Sorry for the rapidfire comments.

It’d be cool if there were some way to set show style, e.g. Tallies.show_style(:plot). That way, if I want to see the barchart instead of the counts every time I push!(t, val) a value into a tally, I could just set that value and watch the chart update in realtime instead of having to call Tallies.plot(t) each time.

Also: What if items should be removed from the count? And what if an entire category should be removed?

Also: What if I already know some of the counts, and I just want to create a Tally object so that I can conveniently add to them? This might be convenient for example:

Tally(Dict("a"=>2, "b"=>1, "c"=>3))

Also: basic tally arithmetic could be useful, such as adding or subtracting tallies.

Edit: Also, if you declare a Tally to be a subtype of AbstractDict, then I think you might get Plots.jl plotting for free as t=Tally((1,2,3,3)); bar(t)

Edit: Considering that the word “tally” seems to extend to continuous quantities (e.g., “He tallied his expenses every day.” ref), you might consider tallying fractional objects too. Not sure if this throws too big a wrench in the gears.

1 Like

Seems like you need to multiply the percentages by 100.

4 Likes

Maybe a Tally should be a subtype of AbstractDict?

1 Like

YeahYesGIF

An example from engineering to add some color to this comment:

In the field of integrated circuit design, it’s a common problem to encounter undesired behavior from interference due to parasitic capacitances. For example, a sensitive node N13 in the circuit might have its signal corrupted due to physical proximity of its metal trace to a noisy node N971, and it won’t be something we anticipated during design because the lumped-element model we use in schematics abstracts away physical implementation. For troubleshooting, we’ll run parasitics extraction to estimate all the parasitic mutual capacitances in the circuit, and then for a node under consideration N13, we’ll find which nodes it’s coupled to and sort their mutual capacitances to find which ones dominate. The capacitances at the top of the list are the prime suspects (it’s likely that N971 will be in the top-ten), and it’s common to perturb them (by adding or removing capacitance, at least in simulation) to see if it makes the problem better or worse; troubleshooting proceeds from there.

The notion we employ is exactly that of a tally: to tally up the contributions to mutual capacitance from the various nodes, femtofarad by femtofarad (or sometimes attofarad by attofarad), and have the dominant ones float to the top. As a result, for me at least it feels natural to extend a Tally from \mathbb Z^n to \mathbb R^n.

Whether that’s feasible in a single interface, however, I haven’t thought through. If it is though, it could be nice to have it.

1 Like

Or maybe, just for fun, tally should just be a Dictionary object, which has more flexibility than Dict (though this doesn’t have equivalence implemented):

julia> using SplitApplyCombine, Dictionaries

julia> tally(v; by=identity) = map(length, group(by, v)) |> sortkeys
tally (generic function with 1 method)

julia> d = tally(rand(1:5, 25))
5-element Dictionary{Int64, Int64}
 1 │ 4
 2 │ 5
 3 │ 3
 4 │ 5
 5 │ 8

julia> tallies =  "\u007C"^4*"\u0338 "
"||||̸ "

julia> function prison_count(x)
           d, r = divrem(x, 5)
           tallies^d * tallies[1:r]
       end
prison_count (generic function with 1 method)

julia> map(prison_count, d)
5-element Dictionary{Int64, Any}
 1 │ "||||"
 2 │ "||||̸ "
 3 │ "|||"
 4 │ "||||̸ "
 5 │ "||||̸ |||"
2 Likes

I think you mean

julia> function prison_count(x)
           d, r = divrem(x, 5)
           tallies^d * tallies[1]^r
       end

Thank you all for your feedback.

@DNF: Thanks for finding this embarrassing error. I just tagged a new release, where the percentages look more correct.

@uniment I appreciate giving it a try and all your suggestions.

For some non-rational reason I really like the name “Tally” for the package. Since modules have their own name in its namespace, this means I cannot call the structure Tally as you noticed. I am bit reluctant to change the name now. When writing the function tally, I was mainly thinking about the verb, similar to intersect, map or sum.

And I will add the getindex function, probably also the setindex! one.

Sounds like a good idea!

I never thought about that. Is this a hypothetical question or does this happen in practice?

This gets a bit tricky. Since dictionaries are iterable, this should create a tally with three items:

julia> tally(Dict("a"=>2, "b"=>1, "c"=>3))
Tally with 3 items in 3 groups:
"a" => 2 | 1 | 33%
"b" => 1 | 1 | 33%
"c" => 3 | 1 | 33%

One could do this maybe via a keyword a la tally(; init = Dict("a" => 1, "b" => 2))).

Yes, this sounds like a good idea.

Sounds like a good idea! I will give it a try and see if it brakes something else.

5 Likes

Oops, I meant 1:r, but that works too

1 Like

I want to emphasize how idiomatic having the plural of the types is for Julia pacakges.

It’s fine to keep the tally constructor as well. In fact, that’s also idiomatic Julia. See, for example, String which is the name of the type in Base, but has the constructor string. In general, the type name (String) is a “barebones” constructor and does not accept too many variations. The lowercase version (string) accepts a wider variety of inputs.

6 Likes

Ah gotcha. The English language strikes yet again!

I guess my counter-argument would be that although much of your value proposition is in implementing the verb to tally, a large part is also in implementing the noun a tally. Unlike map and sum, which do not return objects that must henceforth be considered maps and sums, tally returns an object which is meaningfully unique. Also, it can be useful to start with an empty tally and accumulate into it over time, so we’d want a way to specify that.

I might propose exposing Tally as a type and constructor, and tally as a verb with its own unique behavior and which returns a Tally. This would be consistent with, for example, Tuple and tuple, which operate as:

julia> Tuple((1,2,3))
(1, 2, 3)

julia> Tuple([1,2,3])
(1, 2, 3)

julia> tuple(1,2,3)
(1, 2, 3)

julia> tuple([1,2,3])
([1, 2, 3],)

Likewise, this could close up a gap in your package, to allow converting some other object to a tally while still allowing tallying iterables:

julia> itr = (:a, :b, :b, :c, :b, :c);

julia> tally(itr...) == Tally(Dict(:a=>1, :b=>3, :c=>2))
true

(note that splat works with iteration, so you can make tally take varargs for a nicer user interface)

Infact, you could have type-specialized constructors, so you could take Pairs directly or otherwise type-specialize:

julia> tally(itr...) == Tally(itr) == Tally(:a=>1, :b=>3, :c=>2) == Tally((a=1, b=3, c=2))
true

I would probably make special-cases for Tally(d::AbstractDict{<:Any,Number}), Tally(ps::Pair{<:Any,Number}...), Tally(nt::NamedTuple{S,T} where {S,T}), and Tally(t::Tally), with a fallback of simply iterating for other types.

The other major attraction to exposing Tally is that then other people can type-specialize on your type, making it all the more useful. How will they want to type-specialize on tallies? Who knows!

Well, sure! When I’m counting things, I might over-count, double-count, or later find that something has been given away or destroyed and should be removed from the count, or maybe something categorically shouldn’t be included in the count. I’d like to be able to subtract from the count to accommodate my own stupidity. :wink:

After you’ve implemented tally arithmetic, here’s another fun idea… You can define promotion rules for arithmetic with Pairs, so for example:

julia> t = tally("a","b","b","c","c","c");

julia> t += "a" => 2;

julia> t
Tally{String, Int64} with 8 items in 3 groups:
c | 3 | 38%
a | 3 | 38%
b | 2 | 25%

That would also make it easy to accumulate non-integer tallies:

julia> N13_parasitics = Tally{Symbol, Float64}();

julia> N13_parasitics += :N971 => 1.3e-14;

julia> N13_parasitics
Tally{Symbol, Float64} with 1 group:
:N971 | 1.3e-14 | 100%

I’m not yet imaginative enough to imagine what complex- or imaginary-valued tallies could mean so I don’t have examples of those for now.

Edit: I should mention another idiomatic possibility for (lowercase) tally behavior, which would take an iterable object and a function (but place the function as first argument):

tally(itr; kw...) = tally(identity, itr; kw...)
tally(by, itr; kw...) = ...

Then, we could use the do statement:

tally(1:100_000) do x
    mod(x, 3)
end

Considering that usually we wish to tally many objects, this is probably better behavior than the recommendation for tally to take varargs.

2 Likes

A naive tally code using SplitApplyCombine for the expert review:

using SplitApplyCombine, Dictionaries, Printf

struct Tally
    gv::Dictionary{Any, Vector{Any}}
    gc::Dictionary{Any, Any}
    gp::Dictionary{Any, Any}
end

function tally(v; f::Function = identity)
    gv = group(f, v)
    gc = groupcount(f, v)
    gp = Dictionary(collect(keys(gc)), 100 * gc.values / sum(gc))
    Tally(gv, gc, gp)
end

function Base.show(io::IO, T::Tally)
    @printf("%12s | %12s | %12s\n", "Element", "Count", "Percentage")
    @printf("%s", "-"^44 * "\n")
    fmt1 = Printf.Format("%12.3g | %12i | %12.3f\n")
    fmt2 = Printf.Format(  "%12s | %12i | %12.3f\n")
    if eltype(first(T.gv.values)[1])== Char
        Printf.format.(Ref(io), Ref(fmt2), first.(T.gv.values), T.gc.values, collect(values(T.gp)))
    else
        Printf.format.(Ref(io), Ref(fmt1), first.(T.gv.values), T.gc.values, collect(values(T.gp)))
    end
end
Test results:
# Check also: T.gv, T.gc, or T.gp

T = tally(1:100000; f = x -> mod(x, 3))

     Element |        Count |   Percentage
--------------------------------------------
           1 |        33334 |       33.334
           2 |        33333 |       33.333
           3 |        33333 |       33.333

v = ["abb", "ba", "aa", "ba", "bbba", "aaab"]
equivalence(x, y) = first(x) == first(y) && last(x) == last(y)
T = tally(v; f = x -> equivalence.(x,v))

     Element |        Count |   Percentage
--------------------------------------------
         abb |            2 |       33.333
          ba |            3 |       50.000
          aa |            1 |       16.667


T = tally(["x", "x", "y", "x"])

     Element |        Count |   Percentage
--------------------------------------------
           x |            3 |       75.000
           y |            1 |       25.000


T = tally(rand(-1:1, 10, 10))

     Element |        Count |   Percentage
--------------------------------------------
          -1 |           37 |       37.000
           0 |           31 |       31.000
           1 |           32 |       32.000
1 Like

Okay, so I’ve come up with an example of complex-valued tallies.

In electronics engineering, we are frequently concerned with the complex-valued impedances of nodes. Such impedances can arise as the placement of multiple reactive and/or resistive components in series, whose impedances add. Of course, in engineering we concern ourselves foremost with which components dominate the expression. As a result, it could be interesting to tally up and build an account of those impedances, and present them in sorted order (most likely sorted by abs2).

As a result, for me at least it feels natural to extend a Tally from \mathbb R^n to \mathbb C^n.

But I will introduce a new wrinkle.

So far we’ve covered only situations where the reducing function for the various quantities is the arithmetic sum.

However, such impedances can also arise due to the parallel placement of components: in these cases, the contribution of each component to the result will not be additive. Instead, the operator that describes how parallel impedances combine is:

julia> ++(zs...) = 1/sum(1/z for z ∈ zs);

julia> 4++6
2.4000000000000004

(this operator has the same properties as addition, except its identity is infinite instead of zero.)

These situations occur whenever rates add, but are expressed as inverses (e.g. “Kelly paints a wall in four hours. John paints a wall in six hours. If they work together, how much time will it take to paint a wall?”). It could be interesting if a Tally is able to tally up and calculate the percentage contribution from each, if it knows the reducing function by which their contributions combine.

This type of combination is what inspires the harmonic mean. More generally, there are other ways that values will commonly combine, which inspires such concepts as the geometric mean, the generalized power mean, or the most general Kolmogorov-Nagumo mean. They can be summarized as arising in cases where quantities undergo some transformation before the arithmetic sum is applied.

So… It could be nice if Tally can accommodate some of this. I’m just spitballing though, so no pressure :wink:

Thanks for the explanation and the other ideas for extending the package!

1 Like

Final two ideas, if you’re not already sick of my ideas by now :stuck_out_tongue_closed_eyes:

Idea 1:

Considering that complex numbers can be thought of as essentially two-dimensional real vectors projected into complex space by the matrix [1 0; 0 im], a natural generalization would be to extend each element of a Tally to m-dimensional real space \mathbb R^m.

So, for an example where each element is 3-dimensional:

t = Tally()
t += (vec1 = [2; 1; 0])
t += :vec2 => [1; 2; 2]

The tally t could by default represent the vector sum of :vec1 and :vec2, with the convenience of showing how each vector contributes to the total.

The show method could present the contributors to the tally, maybe sorted by magnitude, something similar to LinearAlgebra.norm2 or maybe x->sum(x'*x). And the default reducing function for computing a tally would be the broadcasted sum operator .+.

Idea 2:

If you want to see how each element of the tally contributes along a particular direction, you could provide a transformation function which would be applied to each item before sorting. (default might be identity.)

In the example of tallying 3-vectors, you might provide a function x->A*x, where A is a matrix to project x into one-dimensional space (i.e., A is a \mathbb R^{1\times 3} row vector). So, for example, if you want to sort the elements of the tally by their contribution along the third dimension, you might set A=[0 0 1].

In the example of additive complex impedances, you might care to sort by reactance without regard for resistance (i.e., you might want to sort the elements by their imaginary component, rather than their real component nor magnitude). In this case, the transformation function you might provide could be imag.

And perhaps, instead of the default sort to be by magnitude, it might be to sort by tally direction (i.e., determine the unit vector in the direction of the overall tally, and sort the elements by their contribution along this direction).

Apologies if I’m throwing too much out there! I just get excited when I see something simple yet general; there’s a certain elegance to it.

Just a quick update here, because I incorporated some of the suggestions that came up here. I just tagged a new release, which now also includes:

  • Tally arithmetic (+ and -), due to @BeastyBlacksmith.
  • Prison count (thanks @j_verzani for the idea):
    julia> T = tally([1, 1, 1, 2, 2, 2, 2, 2, 2, -1]);
    
    julia> prison_count(T)
    2  ┃ ┼┼┼┼ │
    ━━━╋━━━━━━━
    1  ┃ │││
    ━━━╋━━━━━━━
    -1 ┃ │
    
  • A show_style(style::Symbol) method to change the default printing.
  • Implementation of the Tables.jl interface. Thus one can also use PrettyTables.jl directly:
    julia> pretty_table(T)
    ┌───────┬────────┐
    │  Keys │ Values │
    │ Int64 │  Int64 │
    ├───────┼────────┤
    │     2 │      6 │
    │     1 │      3 │
    │    -1 │      1 │
    └───────┴────────┘
    
2 Likes

This is nice:

julia> t = tally(rand(Bool, 100))
         ┌                                        ┐
   false ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 55   55%
   true  ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 45          45%
         └                                        ┘

julia> t = tally(false for _=1:55) + tally(true for _=1:45)
         ┌                                        ┐
   false ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 55   55%
   true  ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 45          45%
         └                                        ┘

julia> t += tally(false for _=1:50)
         ┌                                        ┐
   false ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 105   70%
   true  ┤■■■■■■■■■■■■■■■ 45                        30%
         └                                        ┘

However I recommend against exporting show_style as a public method unless you make it generic to other types (to do otherwise is unkind to the global namespace). In other words, it should either be non-public like:

Tallies.show_style(s::Symbol)

or generic like:

show_style(::Type{Tally}, s::Symbol) = ...

so the call would go like

show_style(Tally, :plot)

Additional room for improvement:

julia> t isa Tally
ERROR: TypeError: in isa, expected Type, got a value of type Module

julia> Tally(true => 55, false => 45)
ERROR: MethodError: objects of type Module are not callable

julia> 2*t
ERROR: MethodError: no method matching *(::Int64, ::Tally.TallyT{Any})

julia> t += true => 50
ERROR: MethodError: no method matching +(::Tally.TallyT{Any}, ::Pair{Bool, Int64})

Not sure why the generator is slower and takes more memory here:

julia> @btime tally([true for _=1:100]);
  19.200 μs (114 allocations: 2.41 KiB)

julia> @btime tally(true for _=1:100);
  23.000 μs (212 allocations: 3.81 KiB)

And now that we’ve opened the potential for negative tallies:

julia> t - tally(true for _=1:50)
Tally with 50 items in 2 groups:
false | 55 | 110%
true  | -5 | -10%

julia> show_style(:plot)
:plot

julia> t - tally(true for _=1:50)
Error showing value of type Tally.TallyT{Any}:
ERROR: ArgumentError: all values have to be ≥ 0
1 Like