Expressiveness for queries

kumarbalachandran · June 6, 2022, 2:57pm

I have a table, with some missing values and I wanted to do a data retrieval from the table. It so happened that a comparison of the DataFrames implementation I came up with is much more wordy than the equivalent Pandas. So, in Julia, I have:

nrb_df = DataFrame(CSV.File("5GNR_BW_NRB.csv"));
nrb_select = @from i in nrb_df[:, [:FR, :SCS_kHz, Symbol(bw)]] begin
    @where i.FR==fr
    @where i.SCS_kHz == scs_kHz
    @select i
    @collect DataFrame
end;
nrb = nrb_select[:, Symbol(bw)][1];
if nrb === missing
    error(["Missing number of RBs for selected Frequency ", 
    "Range $(fr), subcarrier spacing $(scs_kHz), and ", 
    "bandwidth $(bw)."]);
end

while the equivalent Pandas implementation is

nrb_df = pd.read_csv('5GNR_BW_NRB.csv')
nrb_df.fillna(-9999, inplace=True)
nrb_df = nrb_df.astype(int)
nrb = nrb_df.query("FR == {} and SCS_kHz == {}".format(fr,scs_kHz))[bw][0]

assert(nrb > 0)

The way pandas handles NaN is by forcing all values to Float64 – this is annoying but ok.

My observation is with the way DataFramesMeta handles queries, using an SQL-compatible syntax. I think it is unfortunate that Julia does not simplify this to a more Matlab compatible syntax or some other pattern such as:

qq=@query :FR == $(fr), :SCS_kHz == $(scs), Symbol(bw) == bw
nrb_select = nrb_df[:, qq]

I think there are some quirks with the way DataFrames is designed that make it look like a two-language solution when embedded in Julia code. Is there a plan to make the package more Julia/MATLAB-like in syntax?
/K

pdeffebach · June 6, 2022, 3:21pm

Note, you are not using DataFramesMeta.jl here, but rather Query.jl. DataFramesMeta.jl works more similar to how you describe

julia> using DataFramesMeta

julia> df = DataFrame(FR = [1, 2, 3], SCS_kHz = [4, 5, 6], bw = [7, 8, 9]);


julia> fr = 1; scs = 4; bw = 7;

julia> @rsubset(df, :FR == fr, :SCS_kHz == scs, :bw == bw)
1×3 DataFrame
 Row │ FR     SCS_kHz  bw    
     │ Int64  Int64    Int64 
─────┼───────────────────────
   1 │     1        4      7

But I think you are saying you want to store the query. This is a good feature request and is tracked in DataFramesMeta.jl here.

It would be great to get a PR together to work on this.

bkamins · June 6, 2022, 3:25pm

The query can easily be stored with DataFrames.jl, but indeed it would be convenient to allow storing it in DataFramesMeta.jl as the syntax is simpler.

The “query” part in DataFrames.jl is a function:

query(fr, scs, bw) = [:FR => ByRow(isequal(fr)), :SCS_kHz => ByRow(isequal(scs)), :bw => ByRow(isequal(bw))]

and you can run it using:

select(df, query(some_fr, some_scs, some_bw))

kumarbalachandran · June 6, 2022, 6:58pm

Thanks. I guess I got confused. I believe this solves my gripe very well, although the feature improvement to query would indeed be very good.

I was trying to do something very quickly and obviously did not bother to try to read up on all the functionality.

Kumar.

Topic		Replies	Views
Packages for DataFrame manipulation/query Data data , dataframes	5	1028	July 4, 2018
DataFrames.jl - Choosing between the core functions and available libraries (Query.jl, DataFramesMeta.jl, etc) Data	10	2073	September 15, 2018
Large dataframe. fast row selection Data query , dataframes	5	2411	September 13, 2018
Issues querying a DataFrame General Usage query , dataframes	5	534	February 21, 2020
Julia: DataFramesMeta Transformation Data question , package	4	1789	April 30, 2017

Expressiveness for queries

Related topics