How to find the first time step (column 1) when the values between time steps in column 2 is of a certain difference

HelgavonLichtenstein · August 1, 2022, 1:20pm

Hello,

I would like to know how to select the first time step (in column 1) where the difference between values in column 2 is 0.01.

df = DataFrame(x=[0.0, 0.5, 1.0, 1.5, 2.0], y=[0.74, 0.64, 0.84, 0.88, 0.89])

The answer I am looking for would be 1.5 as this is the first instance when the difference between t and t+1 would be 0.01. I imagine this could be a for loop but I don’t know how to indicate to compare values from column two with the previous value. Could anyone help me?

pdeffebach · August 1, 2022, 1:33pm

A solution that unfortunately relies on some annoying behavior

julia> using DataFrames, ShiftedArrays;

julia> df = DataFrame(x=[0.0, 0.5, 1.0, 1.5, 2.0], y=[0.74, 0.64, 0.84, 0.88, 0.89]);

julia> df.diff = lead(df.y) .- df.y;

julia> ind = findfirst(x -> ismissing(x) ? false : x ≈ .01, df.diff)
4

lead makes a missing value. But findfirst doesn’t allow missings (which is probably correct behavior), which requires you to handle them in the anonymous function for findfirst.
Due to floating point behavior, we have

julia> (.89 - .88) == .01
false

so we need to use the \approx comparison.

One thing that might make life easier isMissingsAsFalse.jl to automate the missing comparison

nilshg · August 1, 2022, 1:56pm

Alternatively:

julia> findfirst(≈(0.01), diff(y))
4

HelgavonLichtenstein · August 1, 2022, 1:57pm

Thank you. I did not know about lead(). Why is findfirst an anonymous function? (Or how to recognise whether a function is anonymous?)

HelgavonLichtenstein · August 1, 2022, 2:01pm

Thank you. I would not have expected to use \approx in this way.

It works for me when I write findfirst(≈(0.01), diff(df.y))

nilshg · August 1, 2022, 2:03pm

My apologies, I had just copied over the y = [...] bit from your question without the whole DataFrames construction, hence why my post shows diff(y). Indeed you should be using diff(df.y).

pdeffebach · August 1, 2022, 2:08pm

findfirst is not an anonymous function. But something like x -> ... is an anonymous function.

It just means it’s constructed without having a name. So in the expression above

julia> ind = findfirst(x -> ismissing(x) ? false : x ≈ .01, df.diff)

we have

x -> ismissing(x) ? false : x ≈ .01

as the anonymous function.

pdeffebach · August 1, 2022, 2:12pm

diff is good here, but it would definitely take a little bit of thinking to know which index in the original vector this corresponds to.

HelgavonLichtenstein · August 1, 2022, 2:20pm

Unfortunately, when I adapt the code to my actual data I get

TypeError: non-boolean (Missing) used in boolean context
in eval at base\boot.jl:373 
in top-level scope at untitled:35
in findfirst at base\array.jl:2002
in findnext at base\array.jl:1951

The code is

df_high_1.difference = lead(df_high_1.bioS1) .- df_high_1.bioS1 # this works
food_web_1_8_8 = findfirst(≈(1e-10), diff(df_high_1.difference)) # this causes the error

Any idea what could be happening?

Could this mean the condition is never met?

pdeffebach · August 1, 2022, 2:26pm

In your MWE, the missing came after the first value satisfying the condition. So it never hit missing.

In your real case, you have a missing before the first value satisfying the condition.

See my example above about handling missings with an anonymous function.

nilshg · August 1, 2022, 2:42pm

One way around this is to replace diff(df.y) with coalesce.(diff.(df.y), false) which treats any difference including missing values as not 0.01.

In general I agree with Peter though if you run this for different unknown inputs it will be useful to me explicitly handle missing, as well as the case where no difference is 0.01 so that findfirst returns nothing

HelgavonLichtenstein · August 1, 2022, 3:10pm

When I run it, it just gives a checkmark, no index value. When I plug it in to find the index, I get ArgumentError: invalid index: nothing of type Nothing I think this means the code of line ran successfully but did not find anything matching the criteria.

Looking at my difference values, I want to select the time step that is associated with -6.22e-10, but I don’t want to use that exact value as I need to repeat this selection process over other databases.

The purpose of finding this timestep is to approximate when a system becomes stable as the change in values of df.y becomes smaller and smaller.

rocco_sprmnt21 · August 1, 2022, 3:13pm

A little messy, but, this way you don’t have the index shift problem

using IterTools
findfirst(x-> abs(x)≈ .01, [coalesce(Base.splat(-)(tw),0) for tw in  partition(df.y,2,1)])

nilshg · August 1, 2022, 3:20pm

So are you maybe looking for the first time a difference is below a threshold, ie findfirst(<(0.01), diff(y))?

pdeffebach · August 1, 2022, 3:31pm

Can you please try to explain what each step here does? It’s a bit hard to read as is.

rocco_sprmnt21 · August 1, 2022, 6:36pm

in fact it is not a nice piece of code … I try to explain what the various pieces do…

#  the function partition (y, 2,1) generates an iterator of all pairs of consecutive elements of y
julia> cp=collect(partition(df.y,2,1))
4-element Vector{Tuple{Float64, Float64}}:       
 (0.74, 0.64)
 (0.64, 0.84)
 (0.84, 0.88)
 (0.88, 0.89)

#on each of these pairs I apply the function (-), but to do it I have to splat the tuple
julia> -(cp[1]...)
0.09999999999999998
# or 
julia> Base.splat(-)(cp[1])
0.09999999999999998

# to manage any missing I apply  coalesce(val, 0)  to these values

the same result can be obtained by releasing the dependency on the itertools package in the following way

8-element Vector{Union{Missing, Float64}}:
  missing
 0.74
 0.64
 0.84
  missing
 0.88
 0.89
  missing

julia> findfirst(x-> abs(x)≈ .01, coalesce.(Base.splat(-).(zip(y, y[2:end])),NaN))
6

Topic		Replies	Views
Using findfirst with for multiple values and columns in a Dataframe? New to Julia dataframes	21	568	March 24, 2024
Find DataFrame row with missing values present General Usage dataframes	8	687	January 9, 2023
Delete missing values after the last non missing value in each id New to Julia dataframes	7	558	September 1, 2022
Shouldn't `findfirst` propagate `missing` instead of erroring? Internals & Design	8	748	June 10, 2022
Mapping along a dimension with findfirst, sometimes failing General Usage	8	930	May 22, 2019

How to find the first time step (column 1) when the values between time steps in column 2 is of a certain difference

Related topics