Cumulative min / cumulative max

scelles · June 21, 2019, 8:01am

Hello,

Python Pandas have an interesting function named cumulative min (and its counterpart cumulative max)

See python - Rolling min of a Pandas Series without window / cumulative minimum / expanding min - Stack Overflow

In [1]: import pandas as pd

In [2]: s = pd.Series([10, 12, 14, 9, 10, 8, 16, 20])

In [3]: s.cummin()
Out[3]:
0    10
1    10
2    10
3     9
4     9
5     8
6     8
7     8
dtype: int64

or np.minimum.accumulate or s.expanding.min (see Series.expanding)

I’m looking for Julia equivalent function cummin, cummin! (and cummax, cummax!) which could apply to this following s Vector.

julia> s = [10, 12, 14, 9, 10, 8, 16, 20]
8-element Array{Int64,1}:
 10
 12
 14
  9
 10
  8
 16
 2

Any idea?

under-Peter · June 21, 2019, 8:08am

You’re looking for accumulate or accumulate! as in e.g. accumulate(min,s).

scelles · June 21, 2019, 8:10am

Thanks @under-Peter for your answer.

scelles · June 21, 2019, 9:07am

Well in fact my problem is a bit harder here because I need to apply this in combinaison with a groupby in a Julia DataFrame.

In Python I’m (simply) doing

df_lines["BestLaptime"] = df_lines.groupby("Name")["Laptime"].cummin()

my initial idea was

accumulate(min, groupby(df, :Name)[:Laptime])

but groupby(df, :Name)[:Laptime] raises

ERROR: MethodError: no method matching getindex(::GroupedDataFrame{DataFrame}, ::Symbol)

My DataFrame df looks like

2141×3 DataFrame
│ Row  │ RaceTime  │ Name                    │ Laptime  │
│      │ Float64⍰  │ String                  │ Float64⍰ │
├──────┼───────────┼─────────────────────────┼──────────┤
│ 3    │ 6.08653   │ NAME1                   │ missing  │
│ 4    │ 6.08927   │ NAME2                   │ missing  │
│ 5    │ 6.1035    │ NAME3                   │ missing  │
⋮
│ 2138 │ 125.976   │ NAME10                  │ 50.477   │
│ 2139 │ 126.01    │ NAME1                   │ 50.114   │
│ 2140 │ 126.047   │ NAME2                   │ 49.892   │
│ 2141 │ 126.065   │ NAME7                   │ 50.213   │

moreover Laptime column have missing values which should be skipped.

So the first problem is to be able to calculate cumulative min with missing values.

under-Peter · June 21, 2019, 10:06am

It’s easier to help if you provide an example input like e.g.

julia> df = DataFrame(names = ["albert", "albert", "tim", "tim", "albert", "tim", "albert"], times = [3,2,missing,4,missing,2,1]);

with that, you can do (see dataframe-docs)

julia> by(df, :names, cumtimes = :times => t -> accumulate(mymin,t))
7×2 DataFrame
│ Row │ names  │ cumtimes │
│     │ String │ Int64⍰   │
├─────┼────────┼──────────┤
│ 1   │ albert │ 3        │
│ 2   │ albert │ 2        │
│ 3   │ albert │ 2        │
│ 4   │ albert │ 1        │
│ 5   │ tim    │ missing  │
│ 6   │ tim    │ 4        │
│ 7   │ tim    │ 2        │

where mymin is a thin wrapper around min that ignores missing values. Default min(missing, 2) == missing but possibly in your case you’d want:

julia> mymin(::Missing, ::Missing) = missing;
julia> mymin(::Missing, x) = x;
julia> mymin(x,::Missing) = x;

That should get you most of the way, maybe you have to sort your rows in accumulate but you can just try whether it works or not.

scelles · June 21, 2019, 10:11am

Thanks @under-Peter again for this answer but I think there is probably something to do with skip for skipping missing see https://github.com/JuliaData/Missings.jl/issues/97 (instead of defining a custom min function)

scelles · June 21, 2019, 10:19am

The question of adding cummin / cummax was ever considered in this Github issue.

https://github.com/JuliaLang/julia/issues/1649

under-Peter · June 21, 2019, 10:22am

The function should probably be skimissing which is in base.

I assumed that you’d want the result of cummin of e.g. [2,1,missing,3] to be [2,1,1,1]. What skipmissing does is literally just skipping the missing values, so accumulate(min, skipmissing([2,1,missing,3])) will return [2,1,1].
If that’s what you want, use skipmissing, otherwise if length should be conserved, a custom wrapper like above is probably what you want.

Edit: note that the issue you commented is 7 years old - a lot has changed since then and imho using generic functions like reduce or accumulate is superior: less code and more generality.

Topic		Replies	Views
How to compute a "cumulative" in a dataframe (without a for loop) Data question , dataframes	44	9545	September 11, 2021
There is a Problem with Minimum function, can any one help me? General Usage	5	905	August 21, 2024
Getting the min value of two dataframes with identical cols General Usage dataframes	5	629	February 27, 2024
Cummean, cumall, and cumany General Usage	7	1157	September 9, 2020
Transforming a daily DataFrame with missing values into a DataFrame with end-of-month values General Usage dataframes	11	183	November 22, 2024

Cumulative min / cumulative max

Related topics