Topological Data Analysis

shivin9 · February 25, 2017, 5:18am

Hi all, I am Shivin a final year Computer Science student at BITS Pilani, India. I am quite interested in Julia as a language and have contributed to IterativeSolvers.jl and LinearMaps.jl before.
Recently I have got interested in Topological Data Analysis and I feel it would be a great tool for Julia to have and a great project for Google Summer of Code. So would anyone like to get involved with this project?
I am posting some introductory links about the same :- Exposing Financial Crime With AI | SymphonyAI Sensa
Topological data analysis - Wikipedia
Professor Gunnar Carlsson Introduces Topological Data Analysis - YouTube
Gunnar Carlsson on the Shape of Data - YouTube

MikeInnes · February 25, 2017, 3:49pm

This is interesting, but we may not have anyone in the community with expertise on this subject. That’s not necessarily a deal breaker, but we’ll have a high bar for competence in areas we can’t help you with.

Do you have an idea of what this would look like in more concrete terms? Perhaps a simple toy example of the kind of tool you’d be building, and what it looks like when applied to a data set? What’s this project closest to in terms of skill set?

shivin9 · February 26, 2017, 5:19pm

Well it would be a package by itself which will be an [implementation] (What is Python Mapper? — Python Mapper documentation) of the Mapper algorithm. There exists an implementation but it is in Python so I guess that my main task will be to understand it and port it to Julia.

Skill Set :-

Familiarity with writing code in Julia
Familiarity with Topology and specifically Homology
In-depth understanding of algorithms related to Persistent Homology and the Mapper algorithm
Familiarity with graphing packages like Plots.jl

Frankly speaking I have never seen a project involving Topology in GSoC. It is a rather new and unknown domain which is currently dominated by a few experts, some dedicated research labs and a private firm.

That being said I’m attaching a sample of output generated by this algorithm.
This is a study on Asthma where the dataset is high dimensional. The nodes are not individual data points but they are small microclusters. TDA disregards the distance between individual points and aims to reveal the shape in the data which is independent of dimensions. Note that this is a graph and not a plot of the dataset.

dpsanders · February 26, 2017, 11:36pm

Note that the Python implementation (that you didn’t link to) seems to be GPL. It would thus be a better idea not to look at the code, but to re-implement the algorithm from scratch based on the written description in the paper.

shivin9 · February 27, 2017, 2:18am

I have updated the reply accordingly.

mkborregaard · February 27, 2017, 6:40am

That’s such a cool visualization, though.

Vincent_Zoonekynd · February 27, 2017, 4:09pm

Another approach (probably not for GSoC) would simply be to interface with existing C++ libraries: this is what the TDA R package does with GUDHI, Dionysus and PHAT.
However, while the documentation of the R package is good, that of the underlying implementations is very sparse.

shivin9 · March 2, 2017, 2:01am

@MikeInnes, so what do you think? Can it become a good project? How can I further strengthen my application?

MikeInnes · March 3, 2017, 10:25am

Sure, I think this can be a good project. The most important thing for the application will be to have an endorsement from a mentor, but I think you’re on the right track there. Here are some ideas for things that could give your proposal a boost:

Some kind of prototype – just a couple of days work to show what you can do in Julia.
Ideas for extensions you can implement beyond the algorithm itself – perhaps tools for generating the kind of visualisations you’ve shown above?
Clear rationale for a lay-person – I’d like to see some idea of why the project is interesting. Is it mostly an academic interest or does it have immediate practical applications?
Any interest from potential users of this would be a big boon as well. It’s generally a big benefit to projects to have people trying things out and giving feedback during the process.

shivin9 · March 3, 2017, 2:16pm

@dpsanders has volunteered to mentor this project actually.

Yeah sure. I will try to come up with a working prototype and other details while I work on my proposal side-by-side. I will keep you and David posted on that.

Thanks,
Shivin

shivin9 · March 8, 2017, 3:55pm

Hi @MikeInnes and @dpsanders,

Sorry for the delay, I got involved in some other things. Here is a very basic implementation of Mapper algorithm in Julia. It is adapted from sakmapper for now, just to know how feasible the approach is. For now it seems good.

It can extended extensively as suggested by this paper. We can implement different machine learning algorithms for pre-processing the data before it is given to the Mapper algorithm.
This has many practical applications. Ayasdi Core, the software which we’re trying to open source here, is a highly sought after in the industry. You can check out some slides here and a blog post.
@mkborregaard had shown some interest in this project for some visualizations.

wildart · March 25, 2019, 1:43am

Hi everybody,

I’m putting together a collection of visualizations and methods for topological data analysis, TDA.jl.

Feel free to ask any questions here or on github.

Tomas_Pevny · March 25, 2019, 11:41am

I was lately thinking that it would be really nice, if the visualization will be written in Makie, enabling some form of interaction. For example when you hover over the point, it shows you some information. See https://projector.tensorflow.org/ for an inspiration.

Does UMAP belongs to this category of algorithms?

wildart · March 27, 2019, 7:41am

I wanted to use Interact.jl for building interactive plots as all TDA visualizations use Plots.jl recipes.

UMAP is a powerful and simple dimensionality reduction technique mostly suited for visualization but has poor interpretability (as authors mentioned themselves - resulting embedding space has no specific meaning). I do not know how it’s related to TDA techniques, but it’s very common for manifold learning algorithms to construct a graph representation of a local manifold structure to use it for later discovery of a global one.

improbable22 · March 27, 2019, 12:00pm

I have used neither of these, but I see from TDA’s issues that there’s another package which perhaps deserves a link in this thread:

Topic		Replies	Views
[ANN] JuliaTDA: an organization for doing Topological Data Analysis in Julia Package Announcements math	15	1178	June 13, 2024
Minicourse: Topological Data Analysis with Julia Community book , machine-learning , math , course , geometry	0	329	August 12, 2024
[ANN] Manifolds.jl: Manifolds in Julia Package Announcements	10	1912	May 23, 2023
[ANN] Julia Data Science Book and Books.jl Package Announcements announcement , book	15	2847	August 15, 2022
Interested in Summer of Code Machine Learning proposal	7	1366	March 20, 2017

Topological Data Analysis

Related topics