Group by value in column

How do I get 3 groups here based on the column value? In this case when you find a word all in capitals.

df = DataFrame(a = ["HOLA", 2,3,"HELLO", 4,5,6, "HALLO", 7,8,9,10], b = rand(12), c=rand(12))

I want to get 3 groups here. The first one

Row │ a      b          c         
     │ Any    Float64    Float64   
─────┼─────────────────────────────
   1 │ HOLA   0.189917   0.300812
   2 │ 2      0.579802   0.464355
   3 │ 3      0.306022   0.236678

second:

4 │ HELLO  0.40227    0.931759
   5 │ 4      0.632268   0.673693
   6 │ 5      0.465315   0.0911626
   7 │ 6      0.0710601  0.231631

and third:

8 │ HALLO  0.199382   0.851106
   9 │ 7      0.114058   0.0537212
  10 │ 8      0.988661   0.842176
  11 │ 9      0.0238035  0.0757923
  12 │ 10     0.672239   0.335123

thanks for any tips. I don’t want to this by hand for hundreds of column values.

I would like to do it like this

julia> @chain df begin
           @rtransform :all_caps = :a isa AbstractString && all(isuppercase, :a)  
           @transform :num_all_caps = cumsum(:all_caps)
           groupby(:num_all_caps)
       end
GroupedDataFrame with 3 groups based on key: num_all_caps
First Group (3 rows): num_all_caps = 1
 Row │ a     b          c         all_caps  num_all_caps 
     │ Any   Float64    Float64   Bool      Int64        
─────┼───────────────────────────────────────────────────
   1 │ HOLA  0.808749   0.38041       true             1
   2 │ 2     0.783565   0.359544     false             1
   3 │ 3     0.0322656  0.628813     false             1
⋮
Last Group (5 rows): num_all_caps = 3
 Row │ a      b         c         all_caps  num_all_caps 
     │ Any    Float64   Float64   Bool      Int64        
─────┼───────────────────────────────────────────────────
   1 │ HALLO  0.39854   0.770258      true             3
   2 │ 7      0.448122  0.270023     false             3
   3 │ 8      0.156094  0.509108     false             3
   4 │ 9      0.802457  0.805941     false             3
   5 │ 10     0.898761  0.758413     false             3
1 Like

Thanks, although it looks overcomplicated. But it seems to do the job. Well, on second thought, maybe this is the shortest way.

There’s no cumsum(f::Function, x) method, so I think this is basically the shortest way (though you can always avoid making two @transform statements)

There is no cumsum(f::Function, x) but there is cumsum(itr).

@chain df begin
    transform(:a => (z -> cumsum(x isa AbstractString && all(isuppercase, x) for x in z)) => :num_all_caps)
    groupby(:num_all_caps)
end
1 Like

Another option:

ix = [findall(x -> (isa(x, String) && all(isuppercase, x)), df.a); nrow(df)+1]

[df[ix[i]:ix[i+1]-1,:] for i in 1:length(ix)-1]
1 Like

Another option:

import Base.Iterators as Itr

df.g .= Itr.map(let g = "" ; x -> ( x isa AbstractString && all(isuppercase, x) ? (g = x) : g ) ; end, df.a)

# optional:
# using CategoricalArrays
# df.g = categorical(df.g)

DataFrames.groupby(df, :g)
2 Likes

thanks everyone for your feedback. As expected there are wild ways to do it. Good to have them now here for other users :smile:

1 Like