The answer I am looking for would be 1.5 as this is the first instance when the difference between t and t+1 would be 0.01. I imagine this could be a for loop but I don’t know how to indicate to compare values from column two with the previous value. Could anyone help me?
lead makes a missing value. But findfirst doesn’t allow missings (which is probably correct behavior), which requires you to handle them in the anonymous function for findfirst.
Due to floating point behavior, we have
julia> (.89 - .88) == .01
false
so we need to use the \approx comparison.
One thing that might make life easier isMissingsAsFalse.jl to automate the missing comparison
My apologies, I had just copied over the y = [...] bit from your question without the whole DataFrames construction, hence why my post shows diff(y). Indeed you should be using diff(df.y).
Unfortunately, when I adapt the code to my actual data I get
TypeError: non-boolean (Missing) used in boolean context
in eval at base\boot.jl:373
in top-level scope at untitled:35
in findfirst at base\array.jl:2002
in findnext at base\array.jl:1951
The code is
df_high_1.difference = lead(df_high_1.bioS1) .- df_high_1.bioS1 # this works
food_web_1_8_8 = findfirst(≈(1e-10), diff(df_high_1.difference)) # this causes the error
One way around this is to replace diff(df.y) with coalesce.(diff.(df.y), false) which treats any difference including missing values as not 0.01.
In general I agree with Peter though if you run this for different unknown inputs it will be useful to me explicitly handle missing, as well as the case where no difference is 0.01 so that findfirst returns nothing
When I run it, it just gives a checkmark, no index value. When I plug it in to find the index, I get ArgumentError: invalid index: nothing of type Nothing I think this means the code of line ran successfully but did not find anything matching the criteria.
Looking at my difference values, I want to select the time step that is associated with -6.22e-10, but I don’t want to use that exact value as I need to repeat this selection process over other databases.
The purpose of finding this timestep is to approximate when a system becomes stable as the change in values of df.y becomes smaller and smaller.
in fact it is not a nice piece of code … I try to explain what the various pieces do…
# the function partition (y, 2,1) generates an iterator of all pairs of consecutive elements of y
julia> cp=collect(partition(df.y,2,1))
4-element Vector{Tuple{Float64, Float64}}:
(0.74, 0.64)
(0.64, 0.84)
(0.84, 0.88)
(0.88, 0.89)
#on each of these pairs I apply the function (-), but to do it I have to splat the tuple
julia> -(cp[1]...)
0.09999999999999998
# or
julia> Base.splat(-)(cp[1])
0.09999999999999998
# to manage any missing I apply coalesce(val, 0) to these values
the same result can be obtained by releasing the dependency on the itertools package in the following way