How to find the first time step (column 1) when the values between time steps in column 2 is of a certain difference

Hello,

I would like to know how to select the first time step (in column 1) where the difference between values in column 2 is 0.01.

df = DataFrame(x=[0.0, 0.5, 1.0, 1.5, 2.0], y=[0.74, 0.64, 0.84, 0.88, 0.89])

The answer I am looking for would be 1.5 as this is the first instance when the difference between t and t+1 would be 0.01. I imagine this could be a for loop but I don’t know how to indicate to compare values from column two with the previous value. Could anyone help me?

A solution that unfortunately relies on some annoying behavior

julia> using DataFrames, ShiftedArrays;

julia> df = DataFrame(x=[0.0, 0.5, 1.0, 1.5, 2.0], y=[0.74, 0.64, 0.84, 0.88, 0.89]);

julia> df.diff = lead(df.y) .- df.y;

julia> ind = findfirst(x -> ismissing(x) ? false : x ≈ .01, df.diff)
4
  1. lead makes a missing value. But findfirst doesn’t allow missings (which is probably correct behavior), which requires you to handle them in the anonymous function for findfirst.
  2. Due to floating point behavior, we have
julia> (.89 - .88) == .01
false

so we need to use the \approx comparison.

One thing that might make life easier isMissingsAsFalse.jl to automate the missing comparison

Alternatively:

julia> findfirst(≈(0.01), diff(y))
4
1 Like

Thank you. I did not know about lead(). Why is findfirst an anonymous function? (Or how to recognise whether a function is anonymous?)

Thank you. I would not have expected to use \approx in this way.

It works for me when I write findfirst(≈(0.01), diff(df.y))

My apologies, I had just copied over the y = [...] bit from your question without the whole DataFrames construction, hence why my post shows diff(y). Indeed you should be using diff(df.y).

2 Likes

findfirst is not an anonymous function. But something like x -> ... is an anonymous function.

It just means it’s constructed without having a name. So in the expression above

julia> ind = findfirst(x -> ismissing(x) ? false : x ≈ .01, df.diff)

we have

x -> ismissing(x) ? false : x ≈ .01

as the anonymous function.

diff is good here, but it would definitely take a little bit of thinking to know which index in the original vector this corresponds to.

Unfortunately, when I adapt the code to my actual data I get

TypeError: non-boolean (Missing) used in boolean context
in eval at base\boot.jl:373 
in top-level scope at untitled:35
in findfirst at base\array.jl:2002
in findnext at base\array.jl:1951

The code is

df_high_1.difference = lead(df_high_1.bioS1) .- df_high_1.bioS1 # this works
food_web_1_8_8 = findfirst(≈(1e-10), diff(df_high_1.difference)) # this causes the error

Any idea what could be happening?

Could this mean the condition is never met?

In your MWE, the missing came after the first value satisfying the condition. So it never hit missing.

In your real case, you have a missing before the first value satisfying the condition.

See my example above about handling missings with an anonymous function.

One way around this is to replace diff(df.y) with coalesce.(diff.(df.y), false) which treats any difference including missing values as not 0.01.

In general I agree with Peter though if you run this for different unknown inputs it will be useful to me explicitly handle missing, as well as the case where no difference is 0.01 so that findfirst returns nothing

1 Like

When I run it, it just gives a checkmark, no index value. When I plug it in to find the index, I get ArgumentError: invalid index: nothing of type Nothing I think this means the code of line ran successfully but did not find anything matching the criteria.

Looking at my difference values, I want to select the time step that is associated with -6.22e-10, but I don’t want to use that exact value as I need to repeat this selection process over other databases.

The purpose of finding this timestep is to approximate when a system becomes stable as the change in values of df.y becomes smaller and smaller.

A little messy, but, this way you don’t have the index shift problem

using IterTools
findfirst(x-> abs(x)≈ .01, [coalesce(Base.splat(-)(tw),0) for tw in  partition(df.y,2,1)])

So are you maybe looking for the first time a difference is below a threshold, ie findfirst(<(0.01), diff(y))?

1 Like

Can you please try to explain what each step here does? It’s a bit hard to read as is.

in fact it is not a nice piece of code … I try to explain what the various pieces do…

#  the function partition (y, 2,1) generates an iterator of all pairs of consecutive elements of y
julia> cp=collect(partition(df.y,2,1))
4-element Vector{Tuple{Float64, Float64}}:       
 (0.74, 0.64)
 (0.64, 0.84)
 (0.84, 0.88)
 (0.88, 0.89)

#on each of these pairs I apply the function (-), but to do it I have to splat the tuple
julia> -(cp[1]...)
0.09999999999999998
# or 
julia> Base.splat(-)(cp[1])
0.09999999999999998

# to manage any missing I apply  coalesce(val, 0)  to these values

the same result can be obtained by releasing the dependency on the itertools package in the following way

8-element Vector{Union{Missing, Float64}}:
  missing
 0.74
 0.64
 0.84
  missing
 0.88
 0.89
  missing

julia> findfirst(x-> abs(x)≈ .01, coalesce.(Base.splat(-).(zip(y, y[2:end])),NaN))
6