[ANN] JuliaTDA: an organization for doing Topological Data Analysis in Julia

I created this organization with the goal to unite some algorithms used in Topological Data Analysis:
JuliaTDA · GitHub .

The motivation for me doing this in Julia:

  1. I like the Mapper algorithm very much (for more info, see here), but all implementations of it I’ve seen are abandoned or incomplete. For example, as I understand, one of the most important analysis we can do with the mapper graph is to be able to analyse its nodes, even with respect to categorical variables (consider a dataset of medical measurements, and the interesting column is a categorical one of the type “sick” or “health”; I want to be able to color the nodes using this column). So I used the mapper implementation as a mean to study Julia and return to study topological data analysis.
  2. Implementing the Mapper in R in a performant way was tragic, and often involved using some libraries written in C to be fast, avoiding loops, and so on (I even had to use some tricks with dataframes to use tidyr). Even so, whenever I had to calculate a vector of distances from one point to all other points, it wouldn’t fit the RAM. In Julia in can do any loops I need without worrying about performance.

For now, I’ve sketched the packages:

  • GeometricDatasets.jl: to create and manipulate datasets (circles, torus, squares, etc), rotate, translate, and so on.
  • TDAmapper.jl: implementation of the Mapper and BallMapper algorithms. [Trivia: I studied under Facundo Mémoli (one of the authors of the original mapper paper) supervision].
  • ToMATo.jl: a topological-based clustering method that uses 0-d persistence to estimate a reasonable number of clusters; then, create a pseudo-gradient tree to define the clusters.

I am new to Julia, so some of the above packages may contain monstruosities. I am starting to write the documentation for them using Quarto.

The excellent Ripserer and PersistenceDiagrams packages were already written in Julia, for my relief, which greatly shorten the work needed to do TDA in Julia with respect to persistent homology.

I hope to find some more TDA entusiasts around here so we can do some magic together!


Nice, is there a roadmap or a todolist for this organization ?

Not yet! I finished my PhD in TDA 3 years ago, and since then I’ve read just a few papers with the novelties of the area. So I don’t know what kinds of packages would be useful for research and/or applications today.

The short-term goals are finish and document completely (with lots of examples) the Mapper and ToMATo packages.

I know there are some statistical ways to estimate the best parameters for the Mapper, and maybe some ways to use it in union with machine learning methods (eg random forest), but I’m kind of lost about these topics.

The best way to create a useful roadmap is people opening issues or similar on the organization (or answering here!).