Hello! I am trying to use the Ripserer package to do some TDA on a time series, following the example here: Sublevel Set Persistent Homology · Ripserer.jl
When I finished writing this post, I stumbled across a newer version of the example: Cubical Persistent Homology · Ripserer.jl
Which answered some of my questions but not all.
The relevant part of the example is the beginning up until (and not including) the heading “Dealing with Noise,” NOT including the section for the 2D black hole image. The issues arise when generating the colored/starred “Representatives” plot. Both links above are essentially identical for this section other than on two points which I will explain below.
Some parts of my questions will probably require knowledge of Ripserer and/or TDA, but I expect/hope some can be answered by someone who is just more familiar with plotting in Julia than me.
There are three main issues with the example code as-is:
1. Scatter-plotting the “x_min” values as stars
The original example code is passing the plotter CartesianCoordinate{1} objects, which the plotter doesn’t like. This is solved in the newer version by extracting integers from the CartesianCoordinate{1} objects using:
min_indices = [only(birth_simplex(int)) for int in result]
Array of CartesianCoordinate{1} objects ^
min_x = eachindex(curve)[min_indices]
Just integers ^
No questions here but I’ll leave it in case it helps anyone else.
2. Plotting the different intervals as different colored curves (underneath “To get the locations of the minima…”)
Q0: it seems that the example is saying it doesn’t matter if you pass the plotting function the “interval” as is or if you pass it representative(interval), so I am just passing the interval as is. Any thoughts on the difference here are much appreciated.
Running the code as given on the webpage, almost the whole curve is green, and the last little bit is the blue from the original infinite_interval plotting coming through. This is the only other change in the new webpage; the output image given is now accurate to what it currently does, but they don’t give any information about how to get them colored like they used to. I was able to do this by coloring the starting with the most persistent of the intervals and iterating backwards. This way the small intervals don’t get overlapped by the most persistent intervals at the end which cover almost the whole curve.
In the new webpage, later when they are adding noise to the diagram, they are able to take care of this overlapping by just sorting the representatives by birth time.
Note that if you print out an interval, they are the y values of the beginning and end of the part of the curve corresponding to that interval. The infinite interval corresponds to the entire curve. If we plot the infinite interval, and then plot the third interval “filter(isfinite, result)[3]” then you get the following picture. Printing out this interval gives “[-0.153, 0.252)”. The infinite interval is the blue part, and the red part is where the second interval has overlapped the initial plot.
Q1: How does the plotting function know what to plot here? It is not simply plotting every point on the curve which has a y value within the interval; instead, it is plotting all parts of the curve from the beginning up into maximum with the y value that is the end of the interval (in this case 0.252). Why we would want this makes some sense to me in the TDA context (though I don’t understand it well enough to be able to explain it well), but I don’t understand how the plotting function knows what to plot. Maybe the “interval” being passed into it has the x-values baked into it somehow? If so, how does the plotter get them (and how can I get them)?
Understanding what’s going on here will likely also answer my later difficulties with plotting my own data.
There may be more information as to how these intervals work here, but it seems to be more about the simplices/representatives than the intervals themselves: Cohomology, Homology, and Representatives · Ripserer.jl
3. When plotting the final persistence diagram with
plot(result, markercolor=1:6, markeralpha=1)
it will reproduce the plot, but the compiler complains with the following error message:
┌ Warning: Indices Base.OneTo(6) of attribute markercolor
does not match data indices 1:2.
└ @ Plots C:\Users\Owner.julia\packages\Plots\7R93Y\src\utils.jl:141
┌ Info: Data contains NaNs or missing values, and indices of markercolor
vector do not match data indices.
│ If you intend elements of markercolor
to apply to individual NaN-separated segments in the data,
│ pass each segment in a separate vector instead, and use a row vector for markercolor
. Legend entries
│ may be suppressed by passing an empty label.
│ For example,
└ plot([1:2,1:3], [[4,5],[3,4,5]], label=[“y” “”], markercolor=[1 2])
Q2: Anyone know what’s going on here ^ ?
Now, for my data:
I have imported some discrete time series data and I wish to do the same kind of thing as in the example; plot each interval as a different color, and then extract the minimums. I can extract and “star” the minimums fine, but it just plots each interval starting from the beginning instead of starting where its supposed to be in the time series. I expect I was only able to color the example data because it just so happened that every interval for that curve started at x = 0.
Here is the current output.
Here is an example of just one of the intervals (orange) being plotted over the original time series (blue). Notice that if you shift this to the right by some amount, it will overlap the original time series. This is where I want it to start.
**Q3: What would be the easiest way to “color” each interval of my data? Note that intervals will overlap so I’ll need to plot the smallest (least persistent) intervals which will get completely overlapped last. **
Here is my current code. Thanks for your time!
using Images
using Plots
using Ripserer
import XLSXxf = XLSX.readxlsx(“Only_CMS_vs_Control.xlsx”)
sh = xf[“Sheet1”]
x = Array{Int}(range(0, 18))
ts_1 = vec(Array{Float64}(sh[“A1:A19”]))result, _ = ripserer(Cubical(ts_1), reps = true)
infinite_interval = only(filter(!isfinite, result))
finite_intervals = filter(isfinite, result)plt = plot(infinite_interval, ts_1,
legend=false,
title=“Representatives”,
seriestype=:path)for interval in finite_intervals
plot!(plt, interval, ts_1; seriestype=:path)
endmin_indices = [only(birth_simplex(int)) for int in sort(result; by=birth)]
min_x = eachindex(ts_1)[min_indices]scatter!(plt, min_x, ts_1[min_x]; color=1:length(min_x), markershape=:star)
plot(plt, plot(result; markercolor=1:14, markeralpha=1))