Expressiveness for queries

I have a table, with some missing values and I wanted to do a data retrieval from the table. It so happened that a comparison of the DataFrames implementation I came up with is much more wordy than the equivalent Pandas. So, in Julia, I have:

nrb_df = DataFrame(CSV.File("5GNR_BW_NRB.csv"));
nrb_select = @from i in nrb_df[:, [:FR, :SCS_kHz, Symbol(bw)]] begin
    @where i.FR==fr
    @where i.SCS_kHz == scs_kHz
    @select i
    @collect DataFrame
end;
nrb = nrb_select[:, Symbol(bw)][1];
if nrb === missing
    error(["Missing number of RBs for selected Frequency ", 
    "Range $(fr), subcarrier spacing $(scs_kHz), and ", 
    "bandwidth $(bw)."]);
end

while the equivalent Pandas implementation is

nrb_df = pd.read_csv('5GNR_BW_NRB.csv')
nrb_df.fillna(-9999, inplace=True)
nrb_df = nrb_df.astype(int)
nrb = nrb_df.query("FR == {} and SCS_kHz == {}".format(fr,scs_kHz))[bw][0]

assert(nrb > 0)

The way pandas handles NaN is by forcing all values to Float64 – this is annoying but ok.

My observation is with the way DataFramesMeta handles queries, using an SQL-compatible syntax. I think it is unfortunate that Julia does not simplify this to a more Matlab compatible syntax or some other pattern such as:

qq=@query :FR == $(fr), :SCS_kHz == $(scs), Symbol(bw) == bw
nrb_select = nrb_df[:, qq] 

I think there are some quirks with the way DataFrames is designed that make it look like a two-language solution when embedded in Julia code. Is there a plan to make the package more Julia/MATLAB-like in syntax?
/K

Note, you are not using DataFramesMeta.jl here, but rather Query.jl. DataFramesMeta.jl works more similar to how you describe

julia> using DataFramesMeta

julia> df = DataFrame(FR = [1, 2, 3], SCS_kHz = [4, 5, 6], bw = [7, 8, 9]);


julia> fr = 1; scs = 4; bw = 7;

julia> @rsubset(df, :FR == fr, :SCS_kHz == scs, :bw == bw)
1Γ—3 DataFrame
 Row β”‚ FR     SCS_kHz  bw    
     β”‚ Int64  Int64    Int64 
─────┼───────────────────────
   1 β”‚     1        4      7

But I think you are saying you want to store the query. This is a good feature request and is tracked in DataFramesMeta.jl here.

It would be great to get a PR together to work on this.

The query can easily be stored with DataFrames.jl, but indeed it would be convenient to allow storing it in DataFramesMeta.jl as the syntax is simpler.

The β€œquery” part in DataFrames.jl is a function:

query(fr, scs, bw) = [:FR => ByRow(isequal(fr)), :SCS_kHz => ByRow(isequal(scs)), :bw => ByRow(isequal(bw))]

and you can run it using:

select(df, query(some_fr, some_scs, some_bw))

Thanks. I guess I got confused. I believe this solves my gripe very well, although the feature improvement to query would indeed be very good.

I was trying to do something very quickly and obviously did not bother to try to read up on all the functionality.

Kumar.