# How to find the first time step (column 1) when the values between time steps in column 2 is of a certain difference

Hello,

I would like to know how to select the first time step (in column 1) where the difference between values in column 2 is 0.01.

``````df = DataFrame(x=[0.0, 0.5, 1.0, 1.5, 2.0], y=[0.74, 0.64, 0.84, 0.88, 0.89])

``````

The answer I am looking for would be `1.5` as this is the first instance when the difference between t and t+1 would be 0.01. I imagine this could be a for loop but I don’t know how to indicate to compare values from column two with the previous value. Could anyone help me?

A solution that unfortunately relies on some annoying behavior

``````julia> using DataFrames, ShiftedArrays;

julia> df = DataFrame(x=[0.0, 0.5, 1.0, 1.5, 2.0], y=[0.74, 0.64, 0.84, 0.88, 0.89]);

julia> df.diff = lead(df.y) .- df.y;

julia> ind = findfirst(x -> ismissing(x) ? false : x ≈ .01, df.diff)
4
``````
1. `lead` makes a `missing` value. But `findfirst` doesn’t allow `missing`s (which is probably correct behavior), which requires you to handle them in the anonymous function for `findfirst`.
2. Due to floating point behavior, we have
``````julia> (.89 - .88) == .01
false
``````

so we need to use the `\approx` comparison.

One thing that might make life easier isMissingsAsFalse.jl to automate the `missing` comparison

Alternatively:

``````julia> findfirst(≈(0.01), diff(y))
4
``````
1 Like

Thank you. I did not know about `lead()`. Why is `findfirst` an anonymous function? (Or how to recognise whether a function is anonymous?)

Thank you. I would not have expected to use `\approx` in this way.

It works for me when I write `findfirst(≈(0.01), diff(df.y))`

My apologies, I had just copied over the `y = [...]` bit from your question without the whole DataFrames construction, hence why my post shows `diff(y)`. Indeed you should be using `diff(df.y)`.

2 Likes

`findfirst` is not an anonymous function. But something like `x -> ...` is an anonymous function.

It just means it’s constructed without having a name. So in the expression above

``````julia> ind = findfirst(x -> ismissing(x) ? false : x ≈ .01, df.diff)
``````

we have

``````x -> ismissing(x) ? false : x ≈ .01
``````

as the anonymous function.

`diff` is good here, but it would definitely take a little bit of thinking to know which index in the original vector this corresponds to.

Unfortunately, when I adapt the code to my actual data I get

``````TypeError: non-boolean (Missing) used in boolean context
in eval at base\boot.jl:373
in top-level scope at untitled:35
in findfirst at base\array.jl:2002
in findnext at base\array.jl:1951
``````

The code is

``````df_high_1.difference = lead(df_high_1.bioS1) .- df_high_1.bioS1 # this works
food_web_1_8_8 = findfirst(≈(1e-10), diff(df_high_1.difference)) # this causes the error
``````

Any idea what could be happening?

Could this mean the condition is never met?

In your MWE, the `missing` came after the first value satisfying the condition. So it never hit `missing`.

In your real case, you have a missing before the first value satisfying the condition.

See my example above about handling `missing`s with an anonymous function.

One way around this is to replace `diff(df.y)` with `coalesce.(diff.(df.y), false)` which treats any difference including missing values as not 0.01.

In general I agree with Peter though if you run this for different unknown inputs it will be useful to me explicitly handle missing, as well as the case where no difference is 0.01 so that `findfirst` returns `nothing`

1 Like

When I run it, it just gives a checkmark, no index value. When I plug it in to find the index, I get `ArgumentError: invalid index: nothing of type Nothing` I think this means the code of line ran successfully but did not find anything matching the criteria.

Looking at my difference values, I want to select the time step that is associated with -6.22e-10, but I don’t want to use that exact value as I need to repeat this selection process over other databases.

The purpose of finding this timestep is to approximate when a system becomes stable as the change in values of `df.y` becomes smaller and smaller.

A little messy, but, this way you don’t have the index shift problem

``````using IterTools
findfirst(x-> abs(x)≈ .01, [coalesce(Base.splat(-)(tw),0) for tw in  partition(df.y,2,1)])
``````

So are you maybe looking for the first time a difference is below a threshold, ie `findfirst(<(0.01), diff(y))`?

1 Like

Can you please try to explain what each step here does? It’s a bit hard to read as is.

in fact it is not a nice piece of code … I try to explain what the various pieces do…

``````#  the function partition (y, 2,1) generates an iterator of all pairs of consecutive elements of y
julia> cp=collect(partition(df.y,2,1))
4-element Vector{Tuple{Float64, Float64}}:
(0.74, 0.64)
(0.64, 0.84)
(0.84, 0.88)
(0.88, 0.89)

#on each of these pairs I apply the function (-), but to do it I have to splat the tuple
julia> -(cp[1]...)
0.09999999999999998
# or
julia> Base.splat(-)(cp[1])
0.09999999999999998

# to manage any missing I apply  coalesce(val, 0)  to these values

``````

the same result can be obtained by releasing the dependency on the itertools package in the following way

``````8-element Vector{Union{Missing, Float64}}:
missing
0.74
0.64
0.84
missing
0.88
0.89
missing

julia> findfirst(x-> abs(x)≈ .01, coalesce.(Base.splat(-).(zip(y, y[2:end])),NaN))
6
``````