Technical question about DelayEmbeddings.jl

Hi, I was trying to use the DelayEmbeddings.jl package, and I am trying to find a means to obtain the optimal delay embedding parameters (tsu, DeltaT) for a given timeseries. However the documentation
does not seem to mention any such function. I would be grateful if someone could point me towards the right direction.

There are several methods for finding the embedding dimension and delays listed in the Traditional Optimal Embedding and Unified Optimal Embedding sections.

1 Like

Thanks a lot! I will take a look.

I found out that the Traditional Optimal Embedding techniques only accept a vector as an input, i.e., a univariate signal. The Unified Optimal Embedding techniques do accept multivariate inputs, but it has to be in the format of a Dataset. For example if I pass a matrix Data, then it says

julia> res = DelayEmbeddings.pecuzal_embedding(Data)
ERROR: MethodError: no method matching pecuzal_embedding(::Array{Float64,2})
Closest candidates are:
  pecuzal_embedding(::Array{T,1}; τs, w, samplesize, K, KNN, L_threshold, α, p, max_cycles, econ, verbose) where T<:Real at /home/nodi/.julia/packages/DelayEmbeddings/Sa4ev/src/unified_de/pecuzal.jl:85
  pecuzal_embedding(::Dataset{D,T}; τs, w, samplesize, K, KNN, L_threshold, α, p, max_cycles, econ, verbose) where {D, T<:Real} at /home/nodi/.julia/packages/DelayEmbeddings/Sa4ev/src/unified_de/pecuzal.jl:133

Do you know how to convert a matrix to this structure called Dataset ? I could not find it on the internet.

Check out Numerical data. Dataset(data_matrix) is probably what you want.

1 Like

Thanks a lot. That is exactly what I was looking for.
However, on running the algorithm for an actual dataset, I noticed a bug.

julia> X = Dataset(Data);
julia> Y_mt, τ_vals_mt, ts_vals_mt, Ls_mt , εs_mt = pecuzal_embedding(X);
Initializing PECUZAL algorithm for multivariate input...
Starting 1-th embedding cycle...
ERROR: Computed 0-distances, due to duplicate datapoints in your data. Try to add minimal additive noise to the signal you wish to embed and try again.
 [1] error(::String) at ./error.jl:33
 [2] uzal_cost_pecuzal(::Dataset{1,Float64}, ::Dataset{2,Float64}, ::Int64; K::Int64, w::Int64, econ::Bool, metric::Euclidean) at /home/nodi/.julia/packages/DelayEmbeddings/Sa4ev/src/unified_de/pecuzal.jl:508
 [3] local_L_statistics(::Array{Float64,1}, ::Dataset{1,Float64}, ::Array{Float64,1}, ::UnitRange{Int64}, ::Int64, ::Int64, ::Int64, ::Euclidean, ::Bool) at /home/nodi/.julia/packages/DelayEmbeddings/Sa4ev/src/unified_de/pecuzal.jl:363
 [4] choose_right_embedding_params(::Array{Float64,2}, ::Array{Float64,1}, ::Dataset{9,Float64}, ::UnitRange{Int64}, ::Int64, ::Int64, ::Int64, ::Euclidean, ::Bool) at /home/nodi/.julia/packages/DelayEmbeddings/Sa4ev/src/unified_de/pecuzal.jl:340
 [5] first_embedding_cycle_pecuzal!(::Dataset{9,Float64}, ::Int64, ::UnitRange{Int64}, ::Int64, ::Int64, ::Int64, ::Euclidean, ::Float64, ::Float64, ::Int64, ::Array{Int64,1}, ::Array{Int64,1}, ::Array{Float64,1}, ::Array{Array{Float64,2},2}, ::Bool) at /home/nodi/.julia/packages/DelayEmbeddings/Sa4ev/src/unified_de/pecuzal.jl:256
 [6] pecuzal_multivariate_embedding_cycle!(::Array{Any,1}, ::Bool, ::Dataset{9,Float64}, ::UnitRange{Int64}, ::Int64, ::Int64, ::Array{Array{Float64,2},2}, ::Array{Int64,1}, ::Euclidean, ::Array{Float64,1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Float64, ::Float64, ::Int64, ::Bool) at /home/nodi/.julia/packages/DelayEmbeddings/Sa4ev/src/unified_de/pecuzal.jl:226
 [7] pecuzal_embedding(::Dataset{9,Float64}; τs::UnitRange{Int64}, w::Int64, samplesize::Int64, K::Int64, KNN::Int64, L_threshold::Int64, α::Float64, p::Float64, max_cycles::Int64, econ::Bool, verbose::Bool) at /home/nodi/.julia/packages/DelayEmbeddings/Sa4ev/src/unified_de/pecuzal.jl:166
 [8] pecuzal_embedding(::Dataset{9,Float64}) at /home/nodi/.julia/packages/DelayEmbeddings/Sa4ev/src/unified_de/pecuzal.jl:139
 [9] top-level scope at REPL[16]:1

Here Data is a matrix that contains coarse-grained traffic data, and hence has repetitions.

Although this DelayEmbeddings library is impressive and extremely powerful, I think this small bugs need to be fixed to fully unleash its potential.

But that’s not a bug, and the solution (add low-level noise, e.g. Dataset(Data .+ randn(eltype(Data), size(Data))) is part of the error message?

You would of course need to scale the noise by a reasonable factor for the magnitude of your signals.

Yes it was mentioned in the error message.

When I said “bug” I had in mind the results from this paper which says that almost-any function can be turned into an embeding. In particular, it covers cases where the data or time-series has duplicates. Think of a continuous measurement function which is either 0 or 1 in most of the phase space. A timeseries generated by this measurement would have long stretches of 0 and 1. The point of delay-coordinates is to remove these repeats by looking at the history of the measurement.

In fact if there were no duplicates in the timeseries, that would mean that the data-points bear a one-to-one correpsondence with the state-space, and no delays would be needed in the first place.

@datseris would know better than I, as he was an author on the paper which introduced the PECUZAL embedding algorithm, but my guess is that although almost any function can theoretically be embedded, not every embedding algorithm can handle almost any function. (i.e. the requirement of uniqueness in the data is a characteristic of the PECUZAL algorithm which may not be required by other algorithms)

Have you tried other embedding algorithms? Is adding low-level noise a problem for your data/analysis/workflow?

In nonlinear dynamics what is said in theory and what is done in practice are often worlds apart. The embedology paper came out 3 decades before the pecuzal method. Some of the things it said do not apply to the method. In fact, as the embedology is a mathematical paper, many of the things it says may not be applicable in practice. In any case, the PECUZAL method by construction would not be applicable to the timeseries you discribe, which is either 0s or 1s.

In my eyes, nothing that was discussed so far comes even close to a bug. All the questions that have been asked had answers already available in the documentation. The answer to the final problem of zero distances was literally in its error message. The fact that the PECUZAL algorithm doesn’t work for your specific case has absolutely nothing to do with the DelayEmbeddings.jl library, or any software for that matter.

Nevertheless, if you think some things can be improved then do feel free to open issues on GitHub outlining your problem while also providing a Minimal Working Example.

(thanks for the tag @halleysfifthinc )


Thanks Allen. I have started to read that paper to better understand the mechanism. It would be great to hear from the author.

Actually, I work from a theoretic interest, based mostly on the ergodic theory of chaotic attractors. That theory is not yet robust to noise, i.e., in most cases it is unable to conclude whether properties of the original chaotic attractor are robust to noise. Thus I am trying to avoid the addition of noise mainly due to lack of theoretical knowledge supporting robustness to random perturbations. But I will give it a try.

Thanks a lot @Datseris . I have started reading your paper too.

Maybe you algorithm was intended for a different use-case. Incidentally, we used a simple constant 50-delays on this data and reconstructed the traffic data as a quasiperiodically driven chaotic system. This number 50 was chosen a bit arbitrarily, but it proves my point that delay-coordinates work for coarse-grained data. I was in search of an algorithm that would determine a number similar to 50.

But I do stand by my earlier statements,

  1. the purpose of delay-coordinates is to convert coarse-grained measurement into injective maps, which this algorithm cannot (from your own words)
  2. theory and practice is very different - but as I showed you that data from very real-world cases have repititions, and anyone familiar with delay-coordinates would expect that to be remedied by delay-coordinates.
  3. In-fact repetitions in data-points is not an aberration, it is the main use-case of delay-coordinates.

With these in mind, the Julia algorithm is definitely inadequate. I ave not ready the paper yet so I dont know whether it is an issue with the implementation or the actual algorithm. Whether this is to be called a bug or not is not important. The documentation of this algorithm does not mention that it wont work for dataset with repetitions. If it did, that would disqualify each and every situation in which the dataseries is from a partial observation of the system.

Oof, I don’t really know where to start.

So let me get this straight. You started using an algorithm that you have no clue about, and haven’t read its paper. Nevertheless you go around claiming how inadequate it is? Seriously? Wow.

Initially I thought you had symbolic data, and so it was “really important” for you to stay in the (0,1) numbers. Now I see that’s not the case. The algorithm will work if you add any amount of noise to your data, as low as 1e-15. I doubt adding noise of 1e-15 will affect your data in any meaningful way.

Why do you call it the “Julia” algorithm? This makes no sense.

Not even close to what I said, please take 5 seconds to read my message. Besides, delay embedding is a mathematical theorem and has no inherent “purpose”. It simply proves that under appropriate conditions a univariate measurement of a dynamical system may be transformed to a multivariate set which is diffeomorphic to the original attractor. That’s the theory, and the end of the story. Some algorithms exist to estimate good parameters to perform this transformation, and every single algorithm has its limitations.

Let’s mention also that your measurement function may be inappropriate. The delay embedding theorems aren’t magic. You can’t just map the state space into the set {0, 1} and expect that “this is fine”. Some measurement functions can never lead to proper embedding, even if you use an algorithm created by god or whomever else.

That’s not a problem for the pecuzal algorithm, you just add a minimal amount of noise. Just like the error message you received instructed you to do.

you literally pasted a message saying “Computed 0-distances, due to duplicate datapoints in your data. Try to add minimal additive noise to the signal you wish to embed and try again.” It’s literally in your post. And you now you claim “it doesn’t say it”? Wow again.

Once again, you add a minimal amount of noise to the signal, which does not affect the dynamics in any way, yet allows the 0-distance singularity of the numerical method to dissappear. Although as I said above, I don’t think your measurement function is valid for delay embeddings.