Groupby search not working as expected

slowly working my way through RElearning dataframes ( from pandas ) and wanted to search a dataframe for a value. This is a search I will be doing many times so I thought the approach below might speed things up.

using DataFrames

df = DataFrame( test_col = ["a","B","a","D"])

search_test_col = groupby(df,:test_col)

test_value = "B"

 search_test_col[(test_value)]

when I run it I get

ulia> using DataFrames

julia> df = DataFrame( test_col = ["a","B","a","D"])
4ร—1 DataFrame
 Row โ”‚ test_col 
     โ”‚ String   
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ a
   2 โ”‚ B
   3 โ”‚ a
   4 โ”‚ D

julia> search_test_col = groupby(df,:test_col)
GroupedDataFrame with 3 groups based on key: test_col
First Group (2 rows): test_col = "a"
 Row โ”‚ test_col 
     โ”‚ String   
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ a
   2 โ”‚ a
โ‹ฎ
Last Group (1 row): test_col = "D"
 Row โ”‚ test_col 
     โ”‚ String   
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ D

julia> test_value = "B"
"B"

julia> search_test_col[(test_value)]
ERROR: ArgumentError: invalid index: "B" of type String
Stacktrace:
 [1] to_index(i::String)
   @ Base ./indices.jl:300
 [2] to_index(A::GroupedDataFrame{DataFrame}, i::String)
   @ Base ./indices.jl:277
 [3] to_indices
   @ ./indices.jl:333 [inlined]
 [4] to_indices
   @ ./indices.jl:325 [inlined]
 [5] getindex(gd::GroupedDataFrame{DataFrame}, idx::String)
   @ DataFrames ~/.julia/packages/DataFrames/zqFGs/src/groupeddataframe/groupeddataframe.jl:660
 [6] top-level scope
   @ REPL[4]:1

julia> typeof(df.test_col)
Vector{String} (alias for Array{String, 1})

julia> typeof(test_value)
String

julia> df.test_col == test_value
false

julia> df.test_col[2]  == test_value
true

julia> df.test_col[2]
"B"

julia> test_value
"B"

what am I doing wrong please?

Itโ€™s useful to consult the documentation in these situations:

https://dataframes.juliadata.org/stable/lib/indexing/#Indexing-GroupedDataFrames

In particular:

A GroupedDataFrame can behave as either an AbstractVector or AbstractDict depending on the type of index used. Integers (or arrays of them) trigger vector-like indexing while Tupless and NamedTuples trigger dictionary-like indexing.

And:

  • gd[i::Integer] โ†’ Get the ith group.
  • gd[key::NamedTuple] โ†’ Get the group corresponding to the given values of the grouping columns. >The fields of the NamedTuple must match the grouping columns columns passed to [groupby]>(Functions ยท DataFrames.jl) (including order).
  • gd[key::Tuple] โ†’ Same as previous, but omitting the names on key.

So:

julia> search_test_col[(test_value,)]
1ร—1 SubDataFrame
 Row โ”‚ test_col
     โ”‚ String
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ B

Note the comma after test_value. Writing (test_value) doesnโ€™t do anything:

julia> (test_value)
"B"

i.e. itโ€™s the same as writing test_value without parens. If youโ€™re confused about constructing tuples thatโ€™s more of an issue for the base Julia docs rather than DataFrames though.

1 Like

Yes - this is a consequence of โ€œdual indexing styleโ€.

Related recent issues:

In particular using GitHub - andyferris/AcceleratedArrays.jl: Arrays with acceleration indices seems a good general solution if you have a condition on a single column. The only problem is that AcceleratedArrays.jl does not seem to be actively maintained currently, so we need to decide what to do about it.

1 Like

First thank you, as always, for the excellent answer. I had read the docs on groupby but didnโ€™t look at the dataframes groupby documentation.

help?> groupby
search: groupby groupcols groupindices GroupedDataFrame

that said I am still confused

A GroupedDataFrame 1 can behave as either an AbstractVector or AbstractDict depending on the type of index used. Integers (or arrays of them) trigger vector-like indexing while Tupless and NamedTuples trigger dictionary-like indexing.

I was using a Symbol โ€œ:test_colโ€ which would seem to be pretty apposite for this case. I suppose part of my issue is that I donโ€™t know what a Tupless is.

thank you again

1 Like

https://docs.julialang.org/en/v1/manual/functions/#Tuples

1 Like

Consider e.g.:

julia> df = DataFrame(a=[3,3,2,2,2], b=1:5)
5ร—2 DataFrame
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     3      1
   2 โ”‚     3      2
   3 โ”‚     2      3
   4 โ”‚     2      4
   5 โ”‚     2      5

julia> gdf = groupby(df, :a)
GroupedDataFrame with 2 groups based on key: a
First Group (3 rows): a = 2
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     2      3
   2 โ”‚     2      4
   3 โ”‚     2      5
โ‹ฎ
Last Group (2 rows): a = 3
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     3      1
   2 โ”‚     3      2

julia> gdf[2] # pick group number 2
2ร—2 SubDataFrame
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     3      1
   2 โ”‚     3      2

julia> gdf[(2,)] # pick group with grouping variable value equal to 2
3ร—2 SubDataFrame
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     2      3
   2 โ”‚     2      4
   3 โ”‚     2      5
2 Likes

I know what a tuple is but NOT what a tupless is.

 GroupedDataFrame can behave as either an AbstractVector or AbstractDict depending on the type of index used. Integers (or arrays of them) trigger vector-like indexing while **Tupless** and NamedTuples trigger dictionary-like 

Oh sorry I didnโ€™t clock that this was just a joke about the typo in the docstring!

1 Like

thank you professor,

as always a worked example is really helpful. I donโ€™t have a lot of time to learn julia and so having a reference point that I can store away that gives me context to rtfm. They also allow people like me, example driven, to stick with learning the language properly.

I look forward to your new book.
theakson

thanks for the apology but your help is invaluable AND I have a pretty thick skin, and head :slight_smile: I learn from examples MORE than rtfm. Examples, like yours, raise questions which lead to me rtfm.
thanks again for everything you do.
theakson

1 Like