How do I get 3 groups here based on the column value? In this case when you find a word all in capitals.
df = DataFrame(a = ["HOLA", 2,3,"HELLO", 4,5,6, "HALLO", 7,8,9,10], b = rand(12), c=rand(12))
I want to get 3 groups here. The first one
Row │ a b c
│ Any Float64 Float64
─────┼─────────────────────────────
1 │ HOLA 0.189917 0.300812
2 │ 2 0.579802 0.464355
3 │ 3 0.306022 0.236678
second:
4 │ HELLO 0.40227 0.931759
5 │ 4 0.632268 0.673693
6 │ 5 0.465315 0.0911626
7 │ 6 0.0710601 0.231631
and third:
8 │ HALLO 0.199382 0.851106
9 │ 7 0.114058 0.0537212
10 │ 8 0.988661 0.842176
11 │ 9 0.0238035 0.0757923
12 │ 10 0.672239 0.335123
thanks for any tips. I don’t want to this by hand for hundreds of column values.
I would like to do it like this
julia> @chain df begin
@rtransform :all_caps = :a isa AbstractString && all(isuppercase, :a)
@transform :num_all_caps = cumsum(:all_caps)
groupby(:num_all_caps)
end
GroupedDataFrame with 3 groups based on key: num_all_caps
First Group (3 rows): num_all_caps = 1
Row │ a b c all_caps num_all_caps
│ Any Float64 Float64 Bool Int64
─────┼───────────────────────────────────────────────────
1 │ HOLA 0.808749 0.38041 true 1
2 │ 2 0.783565 0.359544 false 1
3 │ 3 0.0322656 0.628813 false 1
⋮
Last Group (5 rows): num_all_caps = 3
Row │ a b c all_caps num_all_caps
│ Any Float64 Float64 Bool Int64
─────┼───────────────────────────────────────────────────
1 │ HALLO 0.39854 0.770258 true 3
2 │ 7 0.448122 0.270023 false 3
3 │ 8 0.156094 0.509108 false 3
4 │ 9 0.802457 0.805941 false 3
5 │ 10 0.898761 0.758413 false 3
1 Like
Thanks, although it looks overcomplicated. But it seems to do the job. Well, on second thought, maybe this is the shortest way.
There’s no cumsum(f::Function, x)
method, so I think this is basically the shortest way (though you can always avoid making two @transform
statements)
There is no cumsum(f::Function, x)
but there is cumsum(itr)
.
@chain df begin
transform(:a => (z -> cumsum(x isa AbstractString && all(isuppercase, x) for x in z)) => :num_all_caps)
groupby(:num_all_caps)
end
1 Like
Another option:
ix = [findall(x -> (isa(x, String) && all(isuppercase, x)), df.a); nrow(df)+1]
[df[ix[i]:ix[i+1]-1,:] for i in 1:length(ix)-1]
1 Like
Dan
7
Another option:
import Base.Iterators as Itr
df.g .= Itr.map(let g = "" ; x -> ( x isa AbstractString && all(isuppercase, x) ? (g = x) : g ) ; end, df.a)
# optional:
# using CategoricalArrays
# df.g = categorical(df.g)
DataFrames.groupby(df, :g)
2 Likes
thanks everyone for your feedback. As expected there are wild ways to do it. Good to have them now here for other users
1 Like