We (@kahaaga and @datseris) are very proud to announce v3 of ComplexityMeasures.jl. This v3 is the result of a year of very intensive thinking, redesigning, reimplementing, and going lots of back and forth, in order to make a software for estimating “complexity measures” (entropies and similar) from data. In typical Julia fashion we were greedy and hence we wanted the software to satisfy the following:
- Be easy to use, accessible, and well-documented
- Be as fast as it can be
- Be as extendable as possible, making it easy for researchers to add their new methods to the software
- Be based on the mathematically rigorous formulation for probabilities and entropies
- Offer hundreds of complexity or information measures, hundreds more than any other similar software on the (open source) market
ComplexityMeasures.jl v3 satisfies these points and more. The best way to get an overview of the software is via its brand new over-arching tutorial.
What we want to highlight in this release is that we based the software on the mathematically rigorous formulation of estimating a complexity/information measure. For discrete estimation, the process proceeds as follows:
- Given input data, decide how to extract probabilities from data. This means to “discretize” the data, which requires an “outcome space”.
OutcomeSpace
is now a formal and extendable part of the library. - Estimate the probabilities from the data according to the discretization. Biases can occur in this process, so one also needs to choose a
ProbabilityEstimator
instance (also extendable interface). - Choose the information/complexity measure to estimate from the data. This requires a definition of a complexity measure (also extendable interface).
- Lastly, there may be bias in the estimation of the information/complexity (typical e.g., for Shannon entropy) so one also needs to decide the estimator for the information/complexity measure.
These steps are parallelized perfectly in the central function call of the library, to which all other calls end up as:
information(info_estimator, probability_estimator, outcome_space, input_data)
where info_estimator
is a DiscreteInfoEstimator
(which also contains the information measure definition).
Additionally, we provide an interface for differential estimation (using DifferentialInfoEstimator
), which have widespread use in Shannon entropy estimation.
A bonus of this design is that we are able to reproduce most of the quantities that have been labelled “complexity measures” in the literature without explicitly implementing them. By utilizing a specific combination of discretization technique, probability estimator, measure definition and estimator, we can readily compute all possible “complexity quantities” that are based on this approach. Many of these quantities have not even been explored before in the literature, opening up new research opportunities!
We hope this re-design is useful for the wider community, especially the statistics community!