How to select half the rows from a DataFrame column and return an array of values from another column

yold6 · September 11, 2020, 11:41pm

Hi all,

I would like to know how one might partition unique string values from one column of the DataFrame (C) to achieve my goal.

using Random, DataFrames
C = DataFrame(Team = ["Packers", "Knights", "Bills", "Falcons", "Ravens", "Chiefs", "Titans", "Rams", "Bengals", "Colts", "49ers", "Giants", "Lions", "Steelers", "Jaguars", "Dolphins", "Vikings", "Eagles", "Bears", "Jets", "Cardinals", "Patriots", "Bucs", "Cowboys", "Panthers", "Chargers", "Seahawks", "Browns"], Opponent = ["Bears", "Eagles", "Jets", "Vikings", "Dolphins", "Jaguars", "Browns", "Panthers", "Seahawks", "Chargers", "Bucs", "Cowboys", "Cardinals", "Patriots", "Chiefs", "Ravens", "Falcons", "Knights", "Packers", "Bills", "Lions", "Steelers", "49ers", "Giants", "Rams", "Colts", "Bengals", "Titans"])

I would like to divide C into two 1-dimensional arrays A and B such that A contains half of the entries from the column “Team”, and B contains the “Opponent” entry from the same row (in order) as A. So A could be:

A = "Bills", "Ravens", "Giants", ...

to which the corresponding B would be:

B = "Jets", "Dolphins", "Cowboys", ...

The critical points are that all the entries in A and B are unique and correspond to the same row from the original dataframe C. The final order of entries in A and B does not matter.

I’ve tried:

teams = C[!, :Team]

A = sample(1:(length(teams)/2), Int(length(teams)/2), replace = false)

A = teams[A]

B = setdiff(1:(length(teams)/2), A)

B = teams[B, :]

But this fails to maintain the correct pairing between the rows from the “Team” and “Opponent” columns of C. Any insightful input into my dilemma would be much appreciated. Thank you for your time!

kmundnic · September 12, 2020, 2:11am

You’re almost there, right?

using StatsBase

selected = sample(1:size(C,1), size(C,1) ÷ 2, replace=false)
A = C[selected, :Team]
B = C[selected, :Opponent]

yold6 · September 16, 2020, 9:25pm

Thanks kmundnic!
That was just the syntax I was looking for. I appreciate it!

Topic		Replies	Views
DataFrames: obtaining the subset of rows by a set of values New to Julia dataframes	45	23882	April 27, 2024
Selecting the value in a particular row of a column in a DataFrame New to Julia	1	560	July 12, 2020
Use only the x first letters of strings inside a columns of a DataFrame General Usage	4	1007	November 16, 2017
Reduce DataFrame by unique value General Usage dataframes	5	1052	October 14, 2022
DataFrame; create new column of integers representing strings in other column New to Julia dataframes	4	354	December 29, 2022

How to select half the rows from a DataFrame column and return an array of values from another column

Related topics