Hello,
Python Pandas have an interesting function named cumulative min (and its counterpart cumulative max)
See python - Rolling min of a Pandas Series without window / cumulative minimum / expanding min - Stack Overflow
In [1]: import pandas as pd
In [2]: s = pd.Series([10, 12, 14, 9, 10, 8, 16, 20])
In [3]: s.cummin()
Out[3]:
0 10
1 10
2 10
3 9
4 9
5 8
6 8
7 8
dtype: int64
or np.minimum.accumulate
or s.expanding.min
(see Series.expanding
)
Iโm looking for Julia equivalent function cummin
, cummin!
(and cummax
, cummax!
) which could apply to this following s
Vector.
julia> s = [10, 12, 14, 9, 10, 8, 16, 20]
8-element Array{Int64,1}:
10
12
14
9
10
8
16
2
Any idea?
Youโre looking for accumulate
or accumulate!
as in e.g. accumulate(min,s)
.
6 Likes
Thanks @under-Peter for your answer.
Well in fact my problem is a bit harder here because I need to apply this in combinaison with a groupby
in a Julia DataFrame.
In Python Iโm (simply) doing
df_lines["BestLaptime"] = df_lines.groupby("Name")["Laptime"].cummin()
my initial idea was
accumulate(min, groupby(df, :Name)[:Laptime])
but groupby(df, :Name)[:Laptime]
raises
ERROR: MethodError: no method matching getindex(::GroupedDataFrame{DataFrame}, ::Symbol)
My DataFrame df
looks like
2141ร3 DataFrame
โ Row โ RaceTime โ Name โ Laptime โ
โ โ Float64โฐ โ String โ Float64โฐ โ
โโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโค
โ 3 โ 6.08653 โ NAME1 โ missing โ
โ 4 โ 6.08927 โ NAME2 โ missing โ
โ 5 โ 6.1035 โ NAME3 โ missing โ
โฎ
โ 2138 โ 125.976 โ NAME10 โ 50.477 โ
โ 2139 โ 126.01 โ NAME1 โ 50.114 โ
โ 2140 โ 126.047 โ NAME2 โ 49.892 โ
โ 2141 โ 126.065 โ NAME7 โ 50.213 โ
moreover Laptime
column have missing
values which should be skipped.
So the first problem is to be able to calculate cumulative min with missing values.
Itโs easier to help if you provide an example input like e.g.
julia> df = DataFrame(names = ["albert", "albert", "tim", "tim", "albert", "tim", "albert"], times = [3,2,missing,4,missing,2,1]);
with that, you can do (see dataframe-docs)
julia> by(df, :names, cumtimes = :times => t -> accumulate(mymin,t))
7ร2 DataFrame
โ Row โ names โ cumtimes โ
โ โ String โ Int64โฐ โ
โโโโโโโผโโโโโโโโโผโโโโโโโโโโโค
โ 1 โ albert โ 3 โ
โ 2 โ albert โ 2 โ
โ 3 โ albert โ 2 โ
โ 4 โ albert โ 1 โ
โ 5 โ tim โ missing โ
โ 6 โ tim โ 4 โ
โ 7 โ tim โ 2 โ
where mymin
is a thin wrapper around min
that ignores missing
values. Default min(missing, 2) == missing
but possibly in your case youโd want:
julia> mymin(::Missing, ::Missing) = missing;
julia> mymin(::Missing, x) = x;
julia> mymin(x,::Missing) = x;
That should get you most of the way, maybe you have to sort your rows in accumulate
but you can just try whether it works or not.
1 Like
Thanks @under-Peter again for this answer but I think there is probably something to do with skip
for skipping missing
see https://github.com/JuliaData/Missings.jl/issues/97 (instead of defining a custom min
function)
The question of adding cummin
/ cummax
was ever considered in this Github issue.
https://github.com/JuliaLang/julia/issues/1649
The function should probably be skimissing
which is in base.
I assumed that youโd want the result of cummin
of e.g. [2,1,missing,3]
to be [2,1,1,1]
. What skipmissing
does is literally just skipping the missing values, so accumulate(min, skipmissing([2,1,missing,3]))
will return [2,1,1]
.
If thatโs what you want, use skipmissing, otherwise if length should be conserved, a custom wrapper like above is probably what you want.
Edit: note that the issue you commented is 7 years old - a lot has changed since then and imho using generic functions like reduce
or accumulate
is superior: less code and more generality.