I have a table, with some missing values and I wanted to do a data retrieval from the table. It so happened that a comparison of the DataFrames implementation I came up with is much more wordy than the equivalent Pandas. So, in Julia, I have:
nrb_df = DataFrame(CSV.File("5GNR_BW_NRB.csv"));
nrb_select = @from i in nrb_df[:, [:FR, :SCS_kHz, Symbol(bw)]] begin
@where i.FR==fr
@where i.SCS_kHz == scs_kHz
@select i
@collect DataFrame
end;
nrb = nrb_select[:, Symbol(bw)][1];
if nrb === missing
error(["Missing number of RBs for selected Frequency ",
"Range $(fr), subcarrier spacing $(scs_kHz), and ",
"bandwidth $(bw)."]);
end
while the equivalent Pandas implementation is
nrb_df = pd.read_csv('5GNR_BW_NRB.csv')
nrb_df.fillna(-9999, inplace=True)
nrb_df = nrb_df.astype(int)
nrb = nrb_df.query("FR == {} and SCS_kHz == {}".format(fr,scs_kHz))[bw][0]
assert(nrb > 0)
The way pandas handles NaN is by forcing all values to Float64 β this is annoying but ok.
My observation is with the way DataFramesMeta handles queries, using an SQL-compatible syntax. I think it is unfortunate that Julia does not simplify this to a more Matlab compatible syntax or some other pattern such as:
qq=@query :FR == $(fr), :SCS_kHz == $(scs), Symbol(bw) == bw
nrb_select = nrb_df[:, qq]
I think there are some quirks with the way DataFrames is designed that make it look like a two-language solution when embedded in Julia code. Is there a plan to make the package more Julia/MATLAB-like in syntax?
/K