I am started learning julia recently. I am working with a dataset where i want to filter all the rows of single ID in a data frame.
can somebody help me to learn this?
There is no way to do that in an βeasyβ way right now in DataFrames.
If your ID
are unique, you can do
df[findfirst(==("idnum"), df.ID), :] |> first
-
findfirst(==("idnum"), df.ID)
finds the furst occurance of"idnum"
indf.ID
- The indexing will return a
DataFrame
with one row. You callfirst
to get aDataFrameRow
which is a nicer object for this purpose.
Thank you
Trying to understand the question and how different it is from the following MWE:
using DataFrames
team = DataFrame(ID=[1,2,3,4,2,4],name=["John","Jane","Jim","Joe","Jay","Julia"])
team[team.ID .== 4, :]
which for the input:
β Row β ID β name β
β β Int64 β String β
βββββββΌββββββββΌβββββββββ€
β 1 β 1 β John β
β 2 β 2 β Jane β
β 3 β 3 β Jim β
β 4 β 4 β Joe β
β 5 β 2 β Jay β
β 6 β 4 β Julia β
# produces the filtered result by team ID number 4:
β Row β ID β name β
β β Int64 β String β
βββββββΌββββββββΌβββββββββ€
β 1 β 4 β Joe β
β 2 β 4 β Julia β
Or should the output sought consist of only the teams with one member each (i.e., 1 and 3):
using StatsBase
dic = countmap(team.ID)
ix = [dic[x]==1 for x in team.ID]
team[ix,:]
β Row β ID β name β
β β Int64 β String β
βββββββΌββββββββΌβββββββββ€
β 1 β 1 β John β
β 2 β 3 β Jim β
Thanks.
I think itβs assumed that IDs uniquely identify observations here.
this will already return DataFrameRow
and to make sure you have only one row just write:
only(team[team.ID .== 4, :])
or
only(filter(:ID => ==(4), team))
@bkamins, thanks for the feedback. That command throws an error when run for the non-unique ID-rows (2 and 4):
ERROR: ArgumentError: data frame must contain exactly 1 row
Why not throwing an empty data frame instead (if such creature exists)?
NB: the MWE above using countmap()
outputs only the ID rows with multiplicity=1. May it be written more simply within DataFrames framework?
I used only
to show that you can use it to check you get a result that has 1 row (as this was suggested by @pdeffebach above).
Empty data frame DataFrame()
exists, but this is not how only
is defined in Julia Base (you can check its doctsting to find the contract it guarantees).
NB: the MWE above using
countmap()
outputs only the ID rows with multiplicity=1. May it be written more simply within DataFrames framework?
julia> DataFrame(filter(sdf -> nrow(sdf)==1, groupby(team, :ID)))
2Γ2 DataFrame
Row β ID name
β Int64 String
ββββββΌβββββββββββββββ
1 β 1 John
2 β 3 Jim
or
julia> combine(groupby(team, :ID), sdf -> nrow(sdf) == 1 ? sdf : DataFrame())
2Γ2 DataFrame
Row β ID name
β Int64 String
ββββββΌβββββββββββββββ
1 β 1 John
2 β 3 Jim