Groupby search not working as expected

slowly working my way through RElearning dataframes ( from pandas ) and wanted to search a dataframe for a value. This is a search I will be doing many times so I thought the approach below might speed things up.

using DataFrames

df = DataFrame( test_col = ["a","B","a","D"])

search_test_col = groupby(df,:test_col)

test_value = "B"

 search_test_col[(test_value)]

when I run it I get

ulia> using DataFrames

julia> df = DataFrame( test_col = ["a","B","a","D"])
4ร—1 DataFrame
 Row โ”‚ test_col 
     โ”‚ String   
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ a
   2 โ”‚ B
   3 โ”‚ a
   4 โ”‚ D

julia> search_test_col = groupby(df,:test_col)
GroupedDataFrame with 3 groups based on key: test_col
First Group (2 rows): test_col = "a"
 Row โ”‚ test_col 
     โ”‚ String   
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ a
   2 โ”‚ a
โ‹ฎ
Last Group (1 row): test_col = "D"
 Row โ”‚ test_col 
     โ”‚ String   
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ D

julia> test_value = "B"
"B"

julia> search_test_col[(test_value)]
ERROR: ArgumentError: invalid index: "B" of type String
Stacktrace:
 [1] to_index(i::String)
   @ Base ./indices.jl:300
 [2] to_index(A::GroupedDataFrame{DataFrame}, i::String)
   @ Base ./indices.jl:277
 [3] to_indices
   @ ./indices.jl:333 [inlined]
 [4] to_indices
   @ ./indices.jl:325 [inlined]
 [5] getindex(gd::GroupedDataFrame{DataFrame}, idx::String)
   @ DataFrames ~/.julia/packages/DataFrames/zqFGs/src/groupeddataframe/groupeddataframe.jl:660
 [6] top-level scope
   @ REPL[4]:1

julia> typeof(df.test_col)
Vector{String} (alias for Array{String, 1})

julia> typeof(test_value)
String

julia> df.test_col == test_value
false

julia> df.test_col[2]  == test_value
true

julia> df.test_col[2]
"B"

julia> test_value
"B"

what am I doing wrong please?

Itโ€™s useful to consult the documentation in these situations:

https://dataframes.juliadata.org/stable/lib/indexing/#Indexing-GroupedDataFrames

In particular:

A GroupedDataFrame can behave as either an AbstractVector or AbstractDict depending on the type of index used. Integers (or arrays of them) trigger vector-like indexing while Tupless and NamedTuples trigger dictionary-like indexing.

And:

  • gd[i::Integer] โ†’ Get the ith group.
  • gd[key::NamedTuple] โ†’ Get the group corresponding to the given values of the grouping columns. >The fields of the NamedTuple must match the grouping columns columns passed to [groupby]>(Functions ยท DataFrames.jl) (including order).
  • gd[key::Tuple] โ†’ Same as previous, but omitting the names on key.

So:

julia> search_test_col[(test_value,)]
1ร—1 SubDataFrame
 Row โ”‚ test_col
     โ”‚ String
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ B

Note the comma after test_value. Writing (test_value) doesnโ€™t do anything:

julia> (test_value)
"B"

i.e. itโ€™s the same as writing test_value without parens. If youโ€™re confused about constructing tuples thatโ€™s more of an issue for the base Julia docs rather than DataFrames though.

Yes - this is a consequence of โ€œdual indexing styleโ€.

Related recent issues:

In particular using GitHub - andyferris/AcceleratedArrays.jl: Arrays with acceleration indices seems a good general solution if you have a condition on a single column. The only problem is that AcceleratedArrays.jl does not seem to be actively maintained currently, so we need to decide what to do about it.

First thank you, as always, for the excellent answer. I had read the docs on groupby but didnโ€™t look at the dataframes groupby documentation.

help?> groupby
search: groupby groupcols groupindices GroupedDataFrame

that said I am still confused

A GroupedDataFrame 1 can behave as either an AbstractVector or AbstractDict depending on the type of index used. Integers (or arrays of them) trigger vector-like indexing while Tupless and NamedTuples trigger dictionary-like indexing.

I was using a Symbol โ€œ:test_colโ€ which would seem to be pretty apposite for this case. I suppose part of my issue is that I donโ€™t know what a Tupless is.

thank you again

https://docs.julialang.org/en/v1/manual/functions/#Tuples

Consider e.g.:

julia> df = DataFrame(a=[3,3,2,2,2], b=1:5)
5ร—2 DataFrame
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     3      1
   2 โ”‚     3      2
   3 โ”‚     2      3
   4 โ”‚     2      4
   5 โ”‚     2      5

julia> gdf = groupby(df, :a)
GroupedDataFrame with 2 groups based on key: a
First Group (3 rows): a = 2
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     2      3
   2 โ”‚     2      4
   3 โ”‚     2      5
โ‹ฎ
Last Group (2 rows): a = 3
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     3      1
   2 โ”‚     3      2

julia> gdf[2] # pick group number 2
2ร—2 SubDataFrame
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     3      1
   2 โ”‚     3      2

julia> gdf[(2,)] # pick group with grouping variable value equal to 2
3ร—2 SubDataFrame
 Row โ”‚ a      b
     โ”‚ Int64  Int64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     2      3
   2 โ”‚     2      4
   3 โ”‚     2      5

I know what a tuple is but NOT what a tupless is.

 GroupedDataFrame can behave as either an AbstractVector or AbstractDict depending on the type of index used. Integers (or arrays of them) trigger vector-like indexing while **Tupless** and NamedTuples trigger dictionary-like 

Oh sorry I didnโ€™t clock that this was just a joke about the typo in the docstring!

thank you professor,

as always a worked example is really helpful. I donโ€™t have a lot of time to learn julia and so having a reference point that I can store away that gives me context to rtfm. They also allow people like me, example driven, to stick with learning the language properly.

I look forward to your new book.
theakson

thanks for the apology but your help is invaluable AND I have a pretty thick skin, and head :slight_smile: I learn from examples MORE than rtfm. Examples, like yours, raise questions which lead to me rtfm.
thanks again for everything you do.
theakson