I’ve been beating my head against the wall for hours on end trying to figure out how to avoid hard-coding column names in the below example (specifically, in the array comprehension wgts = [sum(cols(Symbol("PWGTP$i"))) for i in 1:80]).
function test(state::Int64, occ::Int64)
@applychunked tbl begin
@where !ismissing(:OCCP) &&
:ST == state &&
(:ESR == 1 || :ESR == 2) &&
:OCCP == occ
@groupby _ :PUMA { total = sum(:PWGTP), wgts = [sum(cols(Symbol("PWGTP$i"))) for i in 1:80] }
end
end
I get an error when doing this: UndefVarError: i not defined. However, it doesn’t appear that you can select columns by their index (at least not in this context) so I don’t know how I can perform operations such as the above one without having to write an insanely long line of code with all 80 variables.
In the data, there are columns :PWGTP, :PWGTP1, ... ,:PWGTP80 and I need to be able to sum the values in columns :PWGTP1, ... , :PWGTP80, kind of like I’ve done with the total = sum(:PWGTP) piece, just without having to write sum() for all 80 columns…
I’m hoping someone familiar with JuliaDB/JuliaDBMeta can help.
One trick is to use _ inside the macro to access the whole table (or the whole row in a “row-based” macro). For example
function test(state::Int64, occ::Int64)
@applychunked tbl begin
@where !ismissing(:OCCP) &&
:ST == state &&
(:ESR == 1 || :ESR == 2) &&
:OCCP == occ
@groupby :PUMA { total = sum(:PWGTP), wgts = [sum(column(_, Symbol("PWGTP$i"))) for i in 1:80] }
end
end
cols tries to somehow also discover the column identities statically for extra optimizations so it doesn’t work inside a for loop, but at least the error message could be better. Would you mind opening an issue about this?
I think it would also be good to add this column() function to the (JuliaDBMeta?) documentation as I don’t see it anywhere. I found cols() obviously and I see that JuliaDB has a columns() function but I was unaware of column().