# Cumulative min / cumulative max

Hello,

Python Pandas have an interesting function named cumulative min (and its counterpart cumulative max)

``````In [1]: import pandas as pd

In [2]: s = pd.Series([10, 12, 14, 9, 10, 8, 16, 20])

In [3]: s.cummin()
Out[3]:
0    10
1    10
2    10
3     9
4     9
5     8
6     8
7     8
dtype: int64
``````

or `np.minimum.accumulate` or `s.expanding.min` (see `Series.expanding`)

Iโm looking for Julia equivalent function `cummin`, `cummin!` (and `cummax`, `cummax!`) which could apply to this following `s` Vector.

``````julia> s = [10, 12, 14, 9, 10, 8, 16, 20]
8-element Array{Int64,1}:
10
12
14
9
10
8
16
2
``````

Any idea?

Youโre looking for `accumulate` or `accumulate!` as in e.g. `accumulate(min,s)`.

5 Likes

Well in fact my problem is a bit harder here because I need to apply this in combinaison with a `groupby` in a Julia DataFrame.

In Python Iโm (simply) doing

``````df_lines["BestLaptime"] = df_lines.groupby("Name")["Laptime"].cummin()
``````

my initial idea was

``````accumulate(min, groupby(df, :Name)[:Laptime])
``````

but `groupby(df, :Name)[:Laptime]` raises

``````ERROR: MethodError: no method matching getindex(::GroupedDataFrame{DataFrame}, ::Symbol)
``````

My DataFrame `df` looks like

``````2141ร3 DataFrame
โ Row  โ RaceTime  โ Name                    โ Laptime  โ
โ      โ Float64โฐ  โ String                  โ Float64โฐ โ
โโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโค
โ 3    โ 6.08653   โ NAME1                   โ missing  โ
โ 4    โ 6.08927   โ NAME2                   โ missing  โ
โ 5    โ 6.1035    โ NAME3                   โ missing  โ
โฎ
โ 2138 โ 125.976   โ NAME10                  โ 50.477   โ
โ 2139 โ 126.01    โ NAME1                   โ 50.114   โ
โ 2140 โ 126.047   โ NAME2                   โ 49.892   โ
โ 2141 โ 126.065   โ NAME7                   โ 50.213   โ
``````

moreover `Laptime` column have `missing` values which should be skipped.

So the first problem is to be able to calculate cumulative min with missing values.

Itโs easier to help if you provide an example input like e.g.

``````julia> df = DataFrame(names = ["albert", "albert", "tim", "tim", "albert", "tim", "albert"], times = [3,2,missing,4,missing,2,1]);
``````

with that, you can do (see dataframe-docs)

``````julia> by(df, :names, cumtimes = :times => t -> accumulate(mymin,t))
7ร2 DataFrame
โ Row โ names  โ cumtimes โ
โ     โ String โ Int64โฐ   โ
โโโโโโโผโโโโโโโโโผโโโโโโโโโโโค
โ 1   โ albert โ 3        โ
โ 2   โ albert โ 2        โ
โ 3   โ albert โ 2        โ
โ 4   โ albert โ 1        โ
โ 5   โ tim    โ missing  โ
โ 6   โ tim    โ 4        โ
โ 7   โ tim    โ 2        โ

``````

where `mymin` is a thin wrapper around `min` that ignores `missing` values. Default `min(missing, 2) == missing` but possibly in your case youโd want:

``````julia> mymin(::Missing, ::Missing) = missing;
julia> mymin(::Missing, x) = x;
julia> mymin(x,::Missing) = x;
``````

That should get you most of the way, maybe you have to sort your rows in `accumulate` but you can just try whether it works or not.

1 Like

Thanks @under-Peter again for this answer but I think there is probably something to do with `skip` for skipping `missing` see https://github.com/JuliaData/Missings.jl/issues/97 (instead of defining a custom `min` function)

The question of adding `cummin` / `cummax` was ever considered in this Github issue.

The function should probably be `skimissing` which is in base.

I assumed that youโd want the result of `cummin` of e.g. `[2,1,missing,3]` to be `[2,1,1,1]`. What `skipmissing` does is literally just skipping the missing values, so `accumulate(min, skipmissing([2,1,missing,3]))` will return `[2,1,1]`.
If thatโs what you want, use skipmissing, otherwise if length should be conserved, a custom wrapper like above is probably what you want.

Edit: note that the issue you commented is 7 years old - a lot has changed since then and imho using generic functions like `reduce` or `accumulate` is superior: less code and more generality.