Query.jl, referencing `_.key` in `@groupby |> @map`

leethargo · November 8, 2018, 7:13pm

I followed the recording of the Queryverse tutorial at JuliaCon by @davidanthoff and now get errors when I use similar queries. In particular, I want to use the _.key shortcut in this query (see full example):

data |>
    @groupby(_.symbol) |>
    @map({Symbol=_.key, Total=sum(_.price)})
--
type NamedTuple has no field key

I’m guessing the issue here is that I am using Julia 1.0 where NamedTuple is part of the language, while the tutorial was based on Julia 0.6 with NamedTuple from a 3rd party package?

In the Gist, I also wonder how to work around that issue.

alejandromerchan · November 8, 2018, 9:50pm

Hi,

I noticed that you used @map({Symbol =_.key, .... and key doesn’t exist in the tuple, so just changing that to @map({Symbol =_.symbol, ... works.

As for a complete example, this those what you want

data |>
    @groupby(_.symbol) |>  
    @map({Symbol = _.symbol[1], Total=sum(_.price)})

alejandromerchan · November 8, 2018, 11:16pm

This version also works.

x = data |>
    @groupby(_.symbol) |>  
    @map({ Symbol = key(_), Total=sum(_.price)})

leethargo · November 9, 2018, 7:44am

Yes, something like this is also what I came up with as a workaround.
It’s still strange that the example from the tutorial would not work.

leethargo · November 9, 2018, 7:44am

Thanks, I did not know about this syntax, which is also nicer than _.key.

davidanthoff · November 11, 2018, 8:43pm

This is one of the breaking changes for the julia 1.0 version: you now generally have to access the key with the key function. That opens up the .foo syntax to access the columns of a group:

using Queryverse

df = DataFrame(A=[:a, :a, :b, :b], B=[1.,2.,3.,4.])

df |> @groupby(_.A) |> @map({A=key(_), B=sum(_.B)})

Note how I can now use _.B to access the whole B column of a given group _.

Soldalma · January 22, 2022, 10:30pm

I could not make this work if grouping was done with more than one variable, like in groupby(_.A, _.B), or even with a third column C.

davidanthoff · January 22, 2022, 11:19pm

Here is how you do that:

df |>
@groupby({_.A, _.B, _.c}) |>
@map({key(_)..., D=sum(_.D), E=mean(_.E)})

or something like that.

The trick is to construct a new named tuple value by which things get grouped in the @groupby command, and then key(_) will be a named tuple itself.

Topic		Replies	Views
Query.jl @groupby How to specify the key selector properly? General Usage	1	531	September 20, 2020
Query.jl v0.7x released Community announcement	5	1645	September 12, 2017
Using Query.jl with an array of Dict Data	1	339	November 25, 2020
Dpylr do equivalent in query.jl standalone syntax? Data	10	1665	March 31, 2021
Calling macro with runtime arguments: case of Query.jl New to Julia	5	524	March 5, 2019

Query.jl, referencing `_.key` in `@groupby |> @map`

Related topics