ComplexityMeasures.jl v3 - a mathematically rigorous software for probability, entropy, and complexity

We (@kahaaga and @datseris) are very proud to announce v3 of ComplexityMeasures.jl. This v3 is the result of a year of very intensive thinking, redesigning, reimplementing, and going lots of back and forth, in order to make a software for estimating “complexity measures” (entropies and similar) from data. In typical Julia fashion we were greedy and hence we wanted the software to satisfy the following:

  • Be easy to use, accessible, and well-documented
  • Be as fast as it can be
  • Be as extendable as possible, making it easy for researchers to add their new methods to the software
  • Be based on the mathematically rigorous formulation for probabilities and entropies
  • Offer hundreds of complexity or information measures, hundreds more than any other similar software on the (open source) market

ComplexityMeasures.jl v3 satisfies these points and more. The best way to get an overview of the software is via its brand new over-arching tutorial.

What we want to highlight in this release is that we based the software on the mathematically rigorous formulation of estimating a complexity/information measure. For discrete estimation, the process proceeds as follows:

  1. Given input data, decide how to extract probabilities from data. This means to “discretize” the data, which requires an “outcome space”. OutcomeSpace is now a formal and extendable part of the library.
  2. Estimate the probabilities from the data according to the discretization. Biases can occur in this process, so one also needs to choose a ProbabilityEstimator instance (also extendable interface).
  3. Choose the information/complexity measure to estimate from the data. This requires a definition of a complexity measure (also extendable interface).
  4. Lastly, there may be bias in the estimation of the information/complexity (typical e.g., for Shannon entropy) so one also needs to decide the estimator for the information/complexity measure.

These steps are parallelized perfectly in the central function call of the library, to which all other calls end up as:

information(info_estimator, probability_estimator, outcome_space, input_data)

where info_estimator is a DiscreteInfoEstimator (which also contains the information measure definition).
Additionally, we provide an interface for differential estimation (using DifferentialInfoEstimator), which have widespread use in Shannon entropy estimation.

A bonus of this design is that we are able to reproduce most of the quantities that have been labelled “complexity measures” in the literature without explicitly implementing them. By utilizing a specific combination of discretization technique, probability estimator, measure definition and estimator, we can readily compute all possible “complexity quantities” that are based on this approach. Many of these quantities have not even been explored before in the literature, opening up new research opportunities!

We hope this re-design is useful for the wider community, especially the statistics community!

32 Likes

I opened julia as "julia --project=. -t6 and then started following the tutorial.

in compiling ComplexityMeasures julia gave the output

Precompiling project...
  51 dependencies successfully precompiled in 333 seconds. 19 already precompiled.
  2 dependencies had output during precompilation:
┌ MKL_jll
│   Downloading artifact: MKL
│
│  [pid 8448] waiting for IO to finish:
│   Handle type        uv_handle_t->data
│   timer              00000275d37b2420->00000275d0c3e8c0
│  This means that a package has started a background task or event source that has not finished running. For precompilation to complete successfully, the event source needs to be closed explicitly. See the developer documentation on fixing precompilation hangs for more help.
│
│  [pid 8448] waiting for IO to finish:
│   Handle type        uv_handle_t->data
│   timer              00000275d37b2420->00000275d0c3e8c0
│  This means that a package has started a background task or event source that has not finished running. For precompilation to complete successfully, the event source needs to be closed explicitly. See the developer documentation on fixing precompilation hangs for more help.
└
┌ Wavelets
│  WARNING: method definition for #computeWavelets#6 at C:\Users\jakez\.julia\packages\Wavelets\ANOxi\src\mod\WT.jl:660 declares type variable S but does not use it.
│  WARNING: method definition for #computeWavelets#7 at C:\Users\jakez\.julia\packages\Wavelets\ANOxi\src\mod\WT.jl:664 declares type variable S but does not use it.
│  WARNING: method definition for #computeWavelets#8 at C:\Users\jakez\.julia\packages\Wavelets\ANOxi\src\mod\WT.jl:670 declares type variable S but does not use it.
│  WARNING: method definition for cwt at C:\Users\jakez\.julia\packages\Wavelets\ANOxi\src\mod\Transforms.jl:216 declares type variable U but does not use it.
└

which I think may be the problem. I did not find the promised fix in the developers documentation. What I did do is remove the package, restart Julia with one thread and add the package again. Then exit Julia and restart as requested and now it is working.

Hey! Which version of Julia are you on? I couldn’t reproduce these warnings on Julia 1.9.4. Good to hear that you got the tutorial up and running, though!

In any case, you can safely ignore these warnings. As indicated by the first line in the log, precompilation was successful despite the warnings. The last warning occurs because there’s an extra type parameter not being referenced inside some function in Wavelets.jl, which ComplexityMeasures.jl depend on. I can’t immediately decipher the warning about MKL_jll, but it is the Julia developer documentation that is referenced - not ComplexityMeasures’s docs.

1 Like

I think the warning about MKL is new in Julia 1.10 and is due to parallel precompilation