How to select half the rows from a DataFrame column and return an array of values from another column

Hi all,

I would like to know how one might partition unique string values from one column of the DataFrame (C) to achieve my goal.

using Random, DataFrames
C = DataFrame(Team = ["Packers", "Knights", "Bills", "Falcons", "Ravens", "Chiefs", "Titans", "Rams", "Bengals", "Colts", "49ers", "Giants", "Lions", "Steelers", "Jaguars", "Dolphins", "Vikings", "Eagles", "Bears", "Jets", "Cardinals", "Patriots", "Bucs", "Cowboys", "Panthers", "Chargers", "Seahawks", "Browns"], Opponent = ["Bears", "Eagles", "Jets", "Vikings", "Dolphins", "Jaguars", "Browns", "Panthers", "Seahawks", "Chargers", "Bucs", "Cowboys", "Cardinals", "Patriots", "Chiefs", "Ravens", "Falcons", "Knights", "Packers", "Bills", "Lions", "Steelers", "49ers", "Giants", "Rams", "Colts", "Bengals", "Titans"])

I would like to divide C into two 1-dimensional arrays A and B such that A contains half of the entries from the column “Team”, and B contains the “Opponent” entry from the same row (in order) as A. So A could be:

A = "Bills", "Ravens", "Giants", ...

to which the corresponding B would be:

B = "Jets", "Dolphins", "Cowboys", ...

The critical points are that all the entries in A and B are unique and correspond to the same row from the original dataframe C. The final order of entries in A and B does not matter.

I’ve tried:

teams = C[!, :Team]

A = sample(1:(length(teams)/2), Int(length(teams)/2), replace = false)

A = teams[A]

B = setdiff(1:(length(teams)/2), A)

B = teams[B, :]

But this fails to maintain the correct pairing between the rows from the “Team” and “Opponent” columns of C. Any insightful input into my dilemma would be much appreciated. Thank you for your time!

You’re almost there, right?

using StatsBase

selected = sample(1:size(C,1), size(C,1) Ă· 2, replace=false)
A = C[selected, :Team]
B = C[selected, :Opponent]
3 Likes

Thanks kmundnic!
That was just the syntax I was looking for. I appreciate it!