Cumulative min / cumulative max


Python Pandas have an interesting function named cumulative min (and its counterpart cumulative max)


In [1]: import pandas as pd

In [2]: s = pd.Series([10, 12, 14, 9, 10, 8, 16, 20])

In [3]: s.cummin()
0    10
1    10
2    10
3     9
4     9
5     8
6     8
7     8
dtype: int64

or np.minimum.accumulate or s.expanding.min (see Series.expanding)

Iโ€™m looking for Julia equivalent function cummin, cummin! (and cummax, cummax!) which could apply to this following s Vector.

julia> s = [10, 12, 14, 9, 10, 8, 16, 20]
8-element Array{Int64,1}:

Any idea?

Youโ€™re looking for accumulate or accumulate! as in e.g. accumulate(min,s).


Thanks @under-Peter for your answer.

Well in fact my problem is a bit harder here because I need to apply this in combinaison with a groupby in a Julia DataFrame.

In Python Iโ€™m (simply) doing

df_lines["BestLaptime"] = df_lines.groupby("Name")["Laptime"].cummin()

my initial idea was

accumulate(min, groupby(df, :Name)[:Laptime])

but groupby(df, :Name)[:Laptime] raises

ERROR: MethodError: no method matching getindex(::GroupedDataFrame{DataFrame}, ::Symbol)

My DataFrame df looks like

2141ร—3 DataFrame
โ”‚ Row  โ”‚ RaceTime  โ”‚ Name                    โ”‚ Laptime  โ”‚
โ”‚      โ”‚ Float64โฐ  โ”‚ String                  โ”‚ Float64โฐ โ”‚
โ”‚ 3    โ”‚ 6.08653   โ”‚ NAME1                   โ”‚ missing  โ”‚
โ”‚ 4    โ”‚ 6.08927   โ”‚ NAME2                   โ”‚ missing  โ”‚
โ”‚ 5    โ”‚ 6.1035    โ”‚ NAME3                   โ”‚ missing  โ”‚
โ”‚ 2138 โ”‚ 125.976   โ”‚ NAME10                  โ”‚ 50.477   โ”‚
โ”‚ 2139 โ”‚ 126.01    โ”‚ NAME1                   โ”‚ 50.114   โ”‚
โ”‚ 2140 โ”‚ 126.047   โ”‚ NAME2                   โ”‚ 49.892   โ”‚
โ”‚ 2141 โ”‚ 126.065   โ”‚ NAME7                   โ”‚ 50.213   โ”‚

moreover Laptime column have missing values which should be skipped.

So the first problem is to be able to calculate cumulative min with missing values.

Itโ€™s easier to help if you provide an example input like e.g.

julia> df = DataFrame(names = ["albert", "albert", "tim", "tim", "albert", "tim", "albert"], times = [3,2,missing,4,missing,2,1]);

with that, you can do (see dataframe-docs)

julia> by(df, :names, cumtimes = :times => t -> accumulate(mymin,t))
7ร—2 DataFrame
โ”‚ Row โ”‚ names  โ”‚ cumtimes โ”‚
โ”‚     โ”‚ String โ”‚ Int64โฐ   โ”‚
โ”‚ 1   โ”‚ albert โ”‚ 3        โ”‚
โ”‚ 2   โ”‚ albert โ”‚ 2        โ”‚
โ”‚ 3   โ”‚ albert โ”‚ 2        โ”‚
โ”‚ 4   โ”‚ albert โ”‚ 1        โ”‚
โ”‚ 5   โ”‚ tim    โ”‚ missing  โ”‚
โ”‚ 6   โ”‚ tim    โ”‚ 4        โ”‚
โ”‚ 7   โ”‚ tim    โ”‚ 2        โ”‚

where mymin is a thin wrapper around min that ignores missing values. Default min(missing, 2) == missing but possibly in your case youโ€™d want:

julia> mymin(::Missing, ::Missing) = missing;
julia> mymin(::Missing, x) = x;
julia> mymin(x,::Missing) = x;

That should get you most of the way, maybe you have to sort your rows in accumulate but you can just try whether it works or not.

1 Like

Thanks @under-Peter again for this answer but I think there is probably something to do with skip for skipping missing see (instead of defining a custom min function)

The question of adding cummin / cummax was ever considered in this Github issue.

The function should probably be skimissing which is in base.

I assumed that youโ€™d want the result of cummin of e.g. [2,1,missing,3] to be [2,1,1,1]. What skipmissing does is literally just skipping the missing values, so accumulate(min, skipmissing([2,1,missing,3])) will return [2,1,1].
If thatโ€™s what you want, use skipmissing, otherwise if length should be conserved, a custom wrapper like above is probably what you want.

Edit: note that the issue you commented is 7 years old - a lot has changed since then and imho using generic functions like reduce or accumulate is superior: less code and more generality.