[ANN] Krippendorff.jl

FPGro · January 30, 2021, 10:22am

Hey folks,

I just wanted to present a small, currently unregistered package that I cleaned up in the last days:
Krippendorff.jl - efficient computation of Krippendorff’s alpha in Julia.

What is it doing?

Krippendorff’s alpha is a measure of inter-rater reliability. You could say it measures how much you can rely on the ratings given by a group of raters to not be produced by chance but by true agreement over the assigned ratings.

It’s conceptually similar to measures like Cohen’ Kappa or Fleiss’ Kappa and all the like, but can handle any number of raters, small sample sizes, missing values and various levels of measurements (nominal, ordinal, interval, ratio data etc.)

Why the package? What are the alternatives?

There are no (to my knowledge) existing implementations of this measure in Julia, so people who needed this would have to roll their own or resort to calling a package in a different language.
The only real inter-rater agreement measure that I found is Cohen’s Kappa implemented in Lighthouse.jl, so it may really be usefull eventually. ~~(If you know anybody here who is affiliated with this package, please ping them)~~ done, thx
It’s my first real functional package, I though it would be a good starting point
I just needed this for a bigger project, really.

Alternative packages are, for example: krippendorff, krippendorff-alpha or simpledorff in Python, or irr in R (see below)

Features

I initially wanted to just PyCall to one of the existing packages in a different project, but there are at least 3 packages who essentially do the same thing differently so I wanted to make it more consistent (and faster). It features:

currently 2 means of computation with automatic choice of the most appropriate one
different distance metrics (levels of measurement really), with the ability to pass a custom one
automatic handling of missing data
automatic detection of the set of possible responses, flexibility to pass your own
it should just work for almost all kinds of inputs: Matrices, {Vectors, Dictionaries, (Named)Tuples} of {Vectors, Dictionaries, (Named)Tuples} with arbitrary eltypes (Ints are most efficient, but anything that works as input to the given distance function works), DataFrames, Generators or really anything that is a Table according to Tables.jl, plus you can choose if you want to iterate over rows or columns if the input supports it
it tries to be as clever as possible for easy usage, while allowing to precompute everything yourself if you want to

TODOs

a lot of edge cases and input types need proper test coverage
Parallelism! It should be possible to make a fast parallel version with transducers, but that’s up for later
If there happen to be more reliability measures in different packages, it may be worthwhile to coalesce those into a comprehensive package similar to irr in R

<I’ll try to insert a speed comparison with calling into other languages here if I have time> EDIT: takes more time than I thought, preliminary: Python krippendorff is fast and easy, the Core computation in Julia (without annotations or multithreading atm) is competitive but the preparation step (figuring out the set of possible responses and preparing the iterator) is slowing things down considerably, the benefit is, that it is more flexible (and you don’t have to use NaNs for missing values)

Special thanks goes to @sylvaticus for his Julia tutorial repo. The section on developing packages helped me a lot with setting up this package with CI etc. Thank you!

ericphanson · February 1, 2021, 1:40pm

I filed a Lighthouse.jl issue about adding this quantity by the way (and am affiliated by way of working for Beacon, although I haven’t worked on Lighthouse much myself yet): Add Krippendorff’s alpha? · Issue #19 · beacon-biosignals/Lighthouse.jl · GitHub

Topic		Replies	Views
A package for Inter-rater Reliability measures Specific Domains question	0	267	September 2, 2020
Cronbach's alpha StatsBase New to Julia question	8	418	August 29, 2023
[ANN] Pingouin.jl: a simple yet exhaustive statistical package Package Announcements statistics	18	4015	February 4, 2023
Pushing Julia/statistics development Statistics	14	6125	August 8, 2022
[ANN] Copulas.jl : A fully `Distributions.jl`-compliant copula package Package Announcements package , announcement , distributions , copula	31	3764	September 3, 2024

[ANN] Krippendorff.jl

What is it doing?

Why the package? What are the alternatives?

Features

TODOs

Related topics