How to filter single subject from dataframe

sai_matcha · April 22, 2021, 11:02am

I am started learning julia recently. I am working with a dataset where i want to filter all the rows of single ID in a data frame.
can somebody help me to learn this?

pdeffebach · April 22, 2021, 11:27am

There is no way to do that in an “easy” way right now in DataFrames.

If your ID are unique, you can do

df[findfirst(==("idnum"), df.ID), :] |> first

findfirst(==("idnum"), df.ID) finds the furst occurance of "idnum" in df.ID
The indexing will return a DataFrame with one row. You call first to get a DataFrameRow which is a nicer object for this purpose.

sai_matcha · April 22, 2021, 4:18pm

Thank you

rafael.guerra · April 23, 2021, 8:53am

Trying to understand the question and how different it is from the following MWE:

using DataFrames

team = DataFrame(ID=[1,2,3,4,2,4],name=["John","Jane","Jim","Joe","Jay","Julia"])

team[team.ID .== 4, :]

which for the input:

│ Row │ ID    │ name   │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 1     │ John   │
│ 2   │ 2     │ Jane   │
│ 3   │ 3     │ Jim    │
│ 4   │ 4     │ Joe    │
│ 5   │ 2     │ Jay    │
│ 6   │ 4     │ Julia  │

# produces the filtered result by team ID number 4:
│ Row │ ID    │ name   │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 4     │ Joe    │
│ 2   │ 4     │ Julia  │

Or should the output sought consist of only the teams with one member each (i.e., 1 and 3):

using StatsBase

dic = countmap(team.ID)
ix = [dic[x]==1 for x in team.ID]
team[ix,:] 

│ Row │ ID    │ name   │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 1     │ John   │
│ 2   │ 3     │ Jim    │

Thanks.

pdeffebach · April 23, 2021, 8:03pm

I think it’s assumed that IDs uniquely identify observations here.

bkamins · April 23, 2021, 8:17pm

this will already return DataFrameRow

bkamins · April 23, 2021, 8:18pm

and to make sure you have only one row just write:

only(team[team.ID .== 4, :])

or

only(filter(:ID => ==(4), team))

rafael.guerra · April 23, 2021, 9:20pm

@bkamins, thanks for the feedback. That command throws an error when run for the non-unique ID-rows (2 and 4):

ERROR: ArgumentError: data frame must contain exactly 1 row

Why not throwing an empty data frame instead (if such creature exists)?

NB: the MWE above using countmap() outputs only the ID rows with multiplicity=1. May it be written more simply within DataFrames framework?

bkamins · April 24, 2021, 12:00am

I used only to show that you can use it to check you get a result that has 1 row (as this was suggested by @pdeffebach above).

Empty data frame DataFrame() exists, but this is not how only is defined in Julia Base (you can check its doctsting to find the contract it guarantees).

NB: the MWE above using countmap() outputs only the ID rows with multiplicity=1. May it be written more simply within DataFrames framework?

julia> DataFrame(filter(sdf -> nrow(sdf)==1, groupby(team, :ID)))
2×2 DataFrame
 Row │ ID     name   
     │ Int64  String 
─────┼───────────────
   1 │     1  John
   2 │     3  Jim

or

julia> combine(groupby(team, :ID), sdf -> nrow(sdf) == 1 ? sdf : DataFrame())
2×2 DataFrame
 Row │ ID     name   
     │ Int64  String 
─────┼───────────────
   1 │     1  John
   2 │     3  Jim

Topic		Replies	Views
Filtering dataframe for unique rows with respect one of column New to Julia question , dataframes	1	51	July 18, 2024
Is there a faster way to extract single rows of a dataframe than the filter function General Usage	3	652	February 18, 2020
Remove all entries that occur more than once New to Julia dataframes	3	425	February 18, 2022
Finding the id of all columns containing 1 for a particular row of a DataFrame New to Julia dataframes	2	433	April 7, 2021
Delete Id from dataframe if variable of id contains particular value General Usage dataframes	6	526	February 10, 2022

How to filter single subject from dataframe

Related topics