Topological Data Analysis

package

#1

Hi all, I am Shivin a final year Computer Science student at BITS Pilani, India. I am quite interested in Julia as a language and have contributed to IterativeSolvers.jl and LinearMaps.jl before.
Recently I have got interested in Topological Data Analysis and I feel it would be a great tool for Julia to have and a great project for Google Summer of Code. So would anyone like to get involved with this project?
I am posting some introductory links about the same :- https://www.ayasdi.com/blog/bigdata/why-topological-data-analysis-works/
https://en.wikipedia.org/wiki/Topological_data_analysis
https://www.youtube.com/watch?v=XfWibrh6stw
https://www.youtube.com/watch?v=kctyag2Xi8o


#2

This is interesting, but we may not have anyone in the community with expertise on this subject. That’s not necessarily a deal breaker, but we’ll have a high bar for competence in areas we can’t help you with.

Do you have an idea of what this would look like in more concrete terms? Perhaps a simple toy example of the kind of tool you’d be building, and what it looks like when applied to a data set? What’s this project closest to in terms of skill set?


#3

Well it would be a package by itself which will be an [implementation] (http://danifold.net/mapper/introduction.html) of the Mapper algorithm. There exists an implementation but it is in Python so I guess that my main task will be to understand it and port it to Julia.

Skill Set :-

  • Familiarity with writing code in Julia
  • Familiarity with Topology and specifically Homology
  • In-depth understanding of algorithms related to Persistent Homology and the Mapper algorithm
  • Familiarity with graphing packages like Plots.jl

Frankly speaking I have never seen a project involving Topology in GSoC. It is a rather new and unknown domain which is currently dominated by a few experts, some dedicated research labs and a private firm.

That being said I’m attaching a sample of output generated by this algorithm.
This is a study on Asthma where the dataset is high dimensional. The nodes are not individual data points but they are small microclusters. TDA disregards the distance between individual points and aims to reveal the shape in the data which is independent of dimensions. Note that this is a graph and not a plot of the dataset.


#4

Note that the Python implementation (that you didn’t link to) seems to be GPL. It would thus be a better idea not to look at the code, but to re-implement the algorithm from scratch based on the written description in the paper.


#5

I have updated the reply accordingly.


#6

That’s such a cool visualization, though.


#7

Another approach (probably not for GSoC) would simply be to interface with existing C++ libraries: this is what the TDA R package does with GUDHI, Dionysus and PHAT.
However, while the documentation of the R package is good, that of the underlying implementations is very sparse.


#8

@MikeInnes, so what do you think? Can it become a good project? How can I further strengthen my application?


#9

Sure, I think this can be a good project. The most important thing for the application will be to have an endorsement from a mentor, but I think you’re on the right track there. Here are some ideas for things that could give your proposal a boost:

  • Some kind of prototype – just a couple of days work to show what you can do in Julia.
  • Ideas for extensions you can implement beyond the algorithm itself – perhaps tools for generating the kind of visualisations you’ve shown above?
  • Clear rationale for a lay-person – I’d like to see some idea of why the project is interesting. Is it mostly an academic interest or does it have immediate practical applications?
  • Any interest from potential users of this would be a big boon as well. It’s generally a big benefit to projects to have people trying things out and giving feedback during the process.

#10

@dpsanders has volunteered to mentor this project actually.

Yeah sure. I will try to come up with a working prototype and other details while I work on my proposal side-by-side. I will keep you and David posted on that.

Thanks,
Shivin


#11

Hi @MikeInnes and @dpsanders,

Sorry for the delay, I got involved in some other things. Here is a very basic implementation of Mapper algorithm in Julia. It is adapted from sakmapper for now, just to know how feasible the approach is. For now it seems good.

  • It can extended extensively as suggested by this paper. We can implement different machine learning algorithms for pre-processing the data before it is given to the Mapper algorithm.

  • This has many practical applications. Ayasdi Core, the software which we’re trying to open source here, is a highly sought after in the industry. You can check out some slides here and a blog post.

  • @mkborregaard had shown some interest in this project for some visualizations.