How to filter single subject from dataframe

I am started learning julia recently. I am working with a dataset where i want to filter all the rows of single ID in a data frame.
can somebody help me to learn this?

There is no way to do that in an β€œeasy” way right now in DataFrames.

If your ID are unique, you can do

df[findfirst(==("idnum"), df.ID), :] |> first
  • findfirst(==("idnum"), df.ID) finds the furst occurance of "idnum" in df.ID
  • The indexing will return a DataFrame with one row. You call first to get a DataFrameRow which is a nicer object for this purpose.
1 Like

Thank you

Trying to understand the question and how different it is from the following MWE:

using DataFrames

team = DataFrame(ID=[1,2,3,4,2,4],name=["John","Jane","Jim","Joe","Jay","Julia"])

team[team.ID .== 4, :]

which for the input:

β”‚ Row β”‚ ID    β”‚ name   β”‚
β”‚     β”‚ Int64 β”‚ String β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ John   β”‚
β”‚ 2   β”‚ 2     β”‚ Jane   β”‚
β”‚ 3   β”‚ 3     β”‚ Jim    β”‚
β”‚ 4   β”‚ 4     β”‚ Joe    β”‚
β”‚ 5   β”‚ 2     β”‚ Jay    β”‚
β”‚ 6   β”‚ 4     β”‚ Julia  β”‚

# produces the filtered result by team ID number 4:
β”‚ Row β”‚ ID    β”‚ name   β”‚
β”‚     β”‚ Int64 β”‚ String β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 4     β”‚ Joe    β”‚
β”‚ 2   β”‚ 4     β”‚ Julia  β”‚ 

Or should the output sought consist of only the teams with one member each (i.e., 1 and 3):

using StatsBase

dic = countmap(team.ID)
ix = [dic[x]==1 for x in team.ID]
team[ix,:] 

β”‚ Row β”‚ ID    β”‚ name   β”‚
β”‚     β”‚ Int64 β”‚ String β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ John   β”‚
β”‚ 2   β”‚ 3     β”‚ Jim    β”‚

Thanks.

1 Like

I think it’s assumed that IDs uniquely identify observations here.

this will already return DataFrameRow

1 Like

and to make sure you have only one row just write:

only(team[team.ID .== 4, :])

or

only(filter(:ID => ==(4), team))
3 Likes

@bkamins, thanks for the feedback. That command throws an error when run for the non-unique ID-rows (2 and 4):

ERROR: ArgumentError: data frame must contain exactly 1 row

Why not throwing an empty data frame instead (if such creature exists)?

NB: the MWE above using countmap() outputs only the ID rows with multiplicity=1. May it be written more simply within DataFrames framework?

I used only to show that you can use it to check you get a result that has 1 row (as this was suggested by @pdeffebach above).

Empty data frame DataFrame() exists, but this is not how only is defined in Julia Base (you can check its doctsting to find the contract it guarantees).

NB: the MWE above using countmap() outputs only the ID rows with multiplicity=1. May it be written more simply within DataFrames framework?

julia> DataFrame(filter(sdf -> nrow(sdf)==1, groupby(team, :ID)))
2Γ—2 DataFrame
 Row β”‚ ID     name   
     β”‚ Int64  String 
─────┼───────────────
   1 β”‚     1  John
   2 β”‚     3  Jim

or

julia> combine(groupby(team, :ID), sdf -> nrow(sdf) == 1 ? sdf : DataFrame())
2Γ—2 DataFrame
 Row β”‚ ID     name   
     β”‚ Int64  String 
─────┼───────────────
   1 β”‚     1  John
   2 β”‚     3  Jim
2 Likes