Hi all, I am Shivin a final year Computer Science student at BITS Pilani, India. I am quite interested in Julia as a language and have contributed to IterativeSolvers.jl and LinearMaps.jl before.
Recently I have got interested in Topological Data Analysis and I feel it would be a great tool for Julia to have and a great project for Google Summer of Code. So would anyone like to get involved with this project?
I am posting some introductory links about the same :- Exposing Financial Crime With AI | SymphonyAI Sensa
Topological data analysis - Wikipedia
Professor Gunnar Carlsson Introduces Topological Data Analysis - YouTube
Gunnar Carlsson on the Shape of Data - YouTube
This is interesting, but we may not have anyone in the community with expertise on this subject. That’s not necessarily a deal breaker, but we’ll have a high bar for competence in areas we can’t help you with.
Do you have an idea of what this would look like in more concrete terms? Perhaps a simple toy example of the kind of tool you’d be building, and what it looks like when applied to a data set? What’s this project closest to in terms of skill set?
Well it would be a package by itself which will be an [implementation] (What is Python Mapper? — Python Mapper documentation) of the Mapper
algorithm. There exists an implementation but it is in Python so I guess that my main task will be to understand it and port it to Julia.
Skill Set :-
- Familiarity with writing code in Julia
- Familiarity with Topology and specifically Homology
- In-depth understanding of algorithms related to Persistent Homology and the Mapper algorithm
- Familiarity with graphing packages like Plots.jl
Frankly speaking I have never seen a project involving Topology in GSoC. It is a rather new and unknown domain which is currently dominated by a few experts, some dedicated research labs and a private firm.
That being said I’m attaching a sample of output generated by this algorithm.
This is a study on Asthma where the dataset is high dimensional. The nodes are not individual data points but they are small microclusters. TDA disregards the distance between individual points and aims to reveal the shape in the data which is independent of dimensions. Note that this is a graph and not a plot of the dataset.
Note that the Python implementation (that you didn’t link to) seems to be GPL. It would thus be a better idea not to look at the code, but to re-implement the algorithm from scratch based on the written description in the paper.
I have updated the reply accordingly.
That’s such a cool visualization, though.
Another approach (probably not for GSoC) would simply be to interface with existing C++ libraries: this is what the TDA R package does with GUDHI, Dionysus and PHAT.
However, while the documentation of the R package is good, that of the underlying implementations is very sparse.
@MikeInnes, so what do you think? Can it become a good project? How can I further strengthen my application?
Sure, I think this can be a good project. The most important thing for the application will be to have an endorsement from a mentor, but I think you’re on the right track there. Here are some ideas for things that could give your proposal a boost:
- Some kind of prototype – just a couple of days work to show what you can do in Julia.
- Ideas for extensions you can implement beyond the algorithm itself – perhaps tools for generating the kind of visualisations you’ve shown above?
- Clear rationale for a lay-person – I’d like to see some idea of why the project is interesting. Is it mostly an academic interest or does it have immediate practical applications?
- Any interest from potential users of this would be a big boon as well. It’s generally a big benefit to projects to have people trying things out and giving feedback during the process.
@dpsanders has volunteered to mentor this project actually.
Yeah sure. I will try to come up with a working prototype and other details while I work on my proposal side-by-side. I will keep you and David posted on that.
Thanks,
Shivin
Hi @MikeInnes and @dpsanders,
Sorry for the delay, I got involved in some other things. Here is a very basic implementation of Mapper algorithm in Julia. It is adapted from sakmapper for now, just to know how feasible the approach is. For now it seems good.
-
It can extended extensively as suggested by this paper. We can implement different machine learning algorithms for pre-processing the data before it is given to the Mapper algorithm.
-
This has many practical applications. Ayasdi Core, the software which we’re trying to open source here, is a highly sought after in the industry. You can check out some slides here and a blog post.
-
@mkborregaard had shown some interest in this project for some visualizations.
Hi everybody,
I’m putting together a collection of visualizations and methods for topological data analysis, TDA.jl.
Feel free to ask any questions here or on github.
I was lately thinking that it would be really nice, if the visualization will be written in Makie, enabling some form of interaction. For example when you hover over the point, it shows you some information. See https://projector.tensorflow.org/ for an inspiration.
Does UMAP belongs to this category of algorithms?
I wanted to use Interact.jl for building interactive plots as all TDA visualizations use Plots.jl recipes.
UMAP is a powerful and simple dimensionality reduction technique mostly suited for visualization but has poor interpretability (as authors mentioned themselves - resulting embedding space has no specific meaning). I do not know how it’s related to TDA techniques, but it’s very common for manifold learning algorithms to construct a graph representation of a local manifold structure to use it for later discovery of a global one.
I have used neither of these, but I see from TDA’s issues that there’s another package which perhaps deserves a link in this thread: