Expressiveness for queries

I have a table, with some missing values and I wanted to do a data retrieval from the table. It so happened that a comparison of the DataFrames implementation I came up with is much more wordy than the equivalent Pandas. So, in Julia, I have:

nrb_df = DataFrame(CSV.File("5GNR_BW_NRB.csv"));
nrb_select = @from i in nrb_df[:, [:FR, :SCS_kHz, Symbol(bw)]] begin
    @where i.FR==fr
    @where i.SCS_kHz == scs_kHz
    @select i
    @collect DataFrame
end;
nrb = nrb_select[:, Symbol(bw)][1];
if nrb === missing
    error(["Missing number of RBs for selected Frequency ", 
    "Range $(fr), subcarrier spacing $(scs_kHz), and ", 
    "bandwidth $(bw)."]);
end

while the equivalent Pandas implementation is

nrb_df = pd.read_csv('5GNR_BW_NRB.csv')
nrb_df.fillna(-9999, inplace=True)
nrb_df = nrb_df.astype(int)
nrb = nrb_df.query("FR == {} and SCS_kHz == {}".format(fr,scs_kHz))[bw][0]

assert(nrb > 0)

The way pandas handles NaN is by forcing all values to Float64 – this is annoying but ok.

My observation is with the way DataFramesMeta handles queries, using an SQL-compatible syntax. I think it is unfortunate that Julia does not simplify this to a more Matlab compatible syntax or some other pattern such as:

qq=@query :FR == $(fr), :SCS_kHz == $(scs), Symbol(bw) == bw
nrb_select = nrb_df[:, qq] 

I think there are some quirks with the way DataFrames is designed that make it look like a two-language solution when embedded in Julia code. Is there a plan to make the package more Julia/MATLAB-like in syntax?
/K

Note, you are not using DataFramesMeta.jl here, but rather Query.jl. DataFramesMeta.jl works more similar to how you describe

julia> using DataFramesMeta

julia> df = DataFrame(FR = [1, 2, 3], SCS_kHz = [4, 5, 6], bw = [7, 8, 9]);


julia> fr = 1; scs = 4; bw = 7;

julia> @rsubset(df, :FR == fr, :SCS_kHz == scs, :bw == bw)
1Γ—3 DataFrame
 Row β”‚ FR     SCS_kHz  bw    
     β”‚ Int64  Int64    Int64 
─────┼───────────────────────
   1 β”‚     1        4      7

But I think you are saying you want to store the query. This is a good feature request and is tracked in DataFramesMeta.jl here.

It would be great to get a PR together to work on this.

1 Like

The query can easily be stored with DataFrames.jl, but indeed it would be convenient to allow storing it in DataFramesMeta.jl as the syntax is simpler.

The β€œquery” part in DataFrames.jl is a function:

query(fr, scs, bw) = [:FR => ByRow(isequal(fr)), :SCS_kHz => ByRow(isequal(scs)), :bw => ByRow(isequal(bw))]

and you can run it using:

select(df, query(some_fr, some_scs, some_bw))
1 Like

Thanks. I guess I got confused. I believe this solves my gripe very well, although the feature improvement to query would indeed be very good.

I was trying to do something very quickly and obviously did not bother to try to read up on all the functionality.

Kumar.