Generate new array using conditional selections from existing arrays

yold6 · September 4, 2020, 10:33pm

Hello,

I am attempting to generate an new array C by using conditional selections from two existing DataFrames (A and B).

The first pre-existing DataFrame (A) is 18 x 18 and contains Float64 values that I would like to assign to specific entries to populate the new array. Their position in the new array depends on values from a different DataFrame.

A=DataFrame(
  xDST=[-0.18,-0.16,-0.19,-0.06,-0.23,-0.1,-0.46,-0.5,-0.33,0,-0.04,0.09,0.08,0.09,0.02,0.09,0.35,1],
  xK=[0.02,-0.08,-0.01,-0.01,-0.08,-0.16,-0.15,-0.41,-0.5,0.1,0.06,0.18,0.02,0.21,0.19,0.35,1,0.35],
  xQB=[0.1,0.09,-0.11,-0.12,-0.05,0,0,-0.15,-0.46,0.35,0.39,0.34,0.16,0.21,0.09,1,0.35,0.09],
  xRB1=[0.05,0.03,0,0.06,0.04,-0.31,0,-0.16,-0.1,-0.14,-0.05,-0.07,-0.12,-0.02,1,0.09,0.19,0.02],
  xRB2=[0.04,-0.08,-0.05,-0.03,0.03,0.04,-0.05,-0.08,-0.23,0.08,-0.08,0.08,-0.08,1,-0.02,0.21,0.21,0.09],
  xTE=[-0.08,0.11,-0.09,-0.1,-0.03,0.06,-0.12,-0.01,-0.06,-0.12,-0.12,-0.08,1,-0.08,-0.12,0.16,0.02,0.08],
  xWR1=[-0.01,-0.08,-0.03,-0.09,-0.05,0,-0.11,-0.01,-0.19,-0.02,0.13,1,-0.08,0.08,-0.07,0.34,0.18,0.09],
  xWR2=[0.09,-0.06,-0.08,0.11,-0.08,0.03,0.09,-0.08,-0.16,-0.01,1,0.13,-0.12,-0.08,-0.05,0.39,0.06,-0.04],
  xWR3=[0.12,0.09,-0.01,-0.08,0.04,0.05,0.1,0.02,-0.18,1,-0.01,-0.02,-0.12,0.08,-0.14,0.35,0.1,0],
  yDST=[0,-0.04,0.09,0.08,0.09,0.02,0.09,0.35,1,-0.18,-0.16,-0.19,-0.06,-0.23,-0.1,-0.46,-0.5,-0.33],
  yK=[0.1,0.06,0.18,0.02,0.21,0.19,0.35,1,0.35,0.02,-0.08,-0.01,-0.01,-0.08,-0.16,-0.15,-0.41,-0.5],
  yQB=[0.35,0.39,0.34,0.16,0.21,0.09,1,0.35,0.09,0.1,0.09,-0.11,-0.12,-0.05,0,0,-0.15,-0.46],
  yRB1=[-0.14,-0.05,-0.07,-0.12,-0.02,1,0.09,0.19,0.02,0.05,0.03,0,0.06,0.04,-0.31,0,-0.16,-0.1],
  yRB2=[0.08,-0.08,0.08,-0.08,1,-0.02,0.21,0.21,0.09,0.04,-0.08,-0.05,-0.03,0.03,0.04,-0.05,-0.08,-0.23],
  yTE=[-0.12,-0.12,-0.08,1,-0.08,-0.12,0.16,0.02,0.08,-0.08,0.11,-0.09,-0.1,-0.03,0.06,-0.12,-0.01,-0.06],
  yWR1=[-0.02,0.13,1,-0.08,0.08,-0.07,0.34,0.18,0.09,-0.01,-0.08,-0.03,-0.09,-0.05,0,-0.11,-0.01,-0.19],
  yWR2=[-0.01,1,0.13,-0.12,-0.08,-0.05,0.39,0.06,-0.04,0.09,-0.06,-0.08,0.11,-0.08,0.03,0.09,-0.08,-0.16],
  yWR3=[1,-0.01,-0.02,-0.12,0.08,-0.14,0.35,0.1,0,0.12,0.09,-0.01,-0.08,0.04,0.05,0.1,0.02,-0.18],
  row=["yWR3","yWR2","yWR1","yTE","yRB2","yRB1","yQB","yK","yDST","xWR3","xWR2","xWR1","xTE","xRB2","xRB1","xQB","xK","xDST"])

The second DataFrame is 28 x 7 (B) and contains five “string” type columns (A first (:First) and last (:Last) name column that serve as identifiers, a categorical string column (:Position), a categorical string column (:Team), and a categorical string column (:Opponent), and two Float64 columns (:Projection, and :StdDev).

B = DataFrame(
First = ["Jerry", "Jake", "Timmy", "Jeff", "Jim", "Daniel", "Steven", "Paul", "Ron", "Peter", "Michael", "Ben", "Andy", "Tom", "Jeremy", "Adam", "Blake", "Greg", "William", "Ken", "Charles", "Ed", "Sandy", "Ezekiel", "Jarrod", "Drew", "Alvin", "Shakell"],
Last  = ["Green", "Blue", "Tan", "Black", "Little", "Yellow", "Purple", "Walker", "Spears", "Ryan", "Collins", "Smith", "Fitzpatrick", "Byers", "Foster", "Valentine", "Castro", "Hull", "Croft", "Perry", "Franks", "Philips", "Crowe", "Petty", "Pitt", "Kennedy", "Woods", "Barclay"],
Position = ["QB", "QB", "QB", "QB", "RB", "RB", "RB", "WR", "RB", "WR", "WR", "WR", "WR", "WR", "RB", "WR", "RB", "WR", "WR", "RB", "RB", "WR", "RB", "WR", "WR", "WR", "WR", "WR"],
Team   = ["Bears", "Lions", "Tigers", "Wolves", "Bears", "Tigers", "Lions", "Tigers", "Wolves", "Lions", "Bears", "Wolves", "Tigers", "Lions", "Wolves", "Bears", "Tigers", "Tigers", "Lions", "Lions", "Bears" , "Tigers", "Wolves", "Wolves", "Bears", "Lions", "Wolves", "Lions"],
Opponent   = ["Lions", "Bears", "Wolves", "Tigers", "Lions", "Wolves", "Bears", "Wolves", "Tigers", "Bears", "Lions", "Tigers", "Wolves", "Bears", "Tigers", "Lions", "Wolves", "Wolves", "Bears", "Bears", "Lions", "Wolves", "Tigers", "Tigers", "Lions", "Bears", "Tigers", "Bears"],
Projection = [20.2, 19.7, 18.9, 17.5, 15.3, 15.0, 14.8, 14.75, 13.4, 12.4, 12.14, 11.4, 10.5, 10.2, 9.7, 9.25, 8.95, 8.45, 7.5, 6.2, 5.9, 5.2, 4.6, 3.2, 2.5, 2.3, 1.9, 1.37],
Variance = (rand(Float64, 28) .* [20.2, 19.7, 18.9, 17.5, 15.3, 15.0, 14.8, 14.75, 13.4, 12.4, 12.14, 11.4, 10.5, 10.2, 9.7, 9.25, 8.95, 8.45, 7.5, 6.2, 5.9, 5.2, 4.6, 3.2, 2.5, 2.3, 1.9, 1.37]/2)
)

For all rows with entries from the same :Team, for each position (QB, RB, WR), I want to populate an array C by assigning to the same row index in B to the corresponding positional “x” value from A ( xQB, xRB1, etc ) for their team and opponent. For example, in row 1, Jerry Green is the QB for the Bears (whose opponent is the Lions), so in C[1,1] I want to assign the value for xQB (A column 3) by xQB (A row 16), = 1).

Next, in C[1,2], the next player (Jake Blue) is the QB for the Lions (whose opponent is the Bears), so in C[1,2], I want to assign the value for yQB (A column 12) by xQB (row 16), = 0).

For C[5,1] , we have Jim Little, an RB for the Bears with the highest projected score at 15.3. Because Jim’s projection is the highest for all RBs on his team, he would be the RB1. The next highest projection for the Bears (Charles Franks - projection 5.9) would be their RB2. A similar assignment follows for WRs on the same team. So in C[5,1], I want to assign the value for xRB1 (A column 4) by xQB (A row 16), = 0.09).

C should start to look like this:

8×8 Array{Float64,2}:
 1.0   0.0   0.0    0.0    0.09   0.0    0.0    0.0
 0.0   1.0   0.0    0.0    0.0    0.0    0.09   0.0
 0.0   0.0   1.0    0.0    0.0    0.09   0.0    0.34
 0.0   0.0   0.0    1.0    0.0    0.0    0.0   -0.11
 0.09  0.0   0.0    0.0    1.0    0.0   -0.31   0.0
 0.0   0.0   0.09   0.0    0.0    1.0    0.0   -0.07
 0.0   0.09  0.0    0.0   -0.31   0.0    1.0    0.0
 0.0   0.0   0.34  -0.11   0.0   -0.07   0.0    1.0

but end up as a 28 x 28. The other wrinkle is that a :Team in B might have more players for a position like RB or WR than entries that correspond in A. For example, in B the Wolves have three RBs, but there are only entries for RB1 and RB2 in A. I would like the additional players be assigned the A entry for the lowest :Projection. So the third highest projected RB (RB3 for the Wolves, Sandy Crowe) would be assigned the RB2 value from A. Similarly for WRs, any additional WRs on a team beyond three would be assigned WR3 value.

Ultimately, I want to use C in the function

cor2cov(C, s)

with s the :StdDev vector to compute a covariance matrix.

What I have tried is:

teams = unique(B[!, :Team])
opponents = B[!, :Opponent]  
pair_info = zeros(Float64, size(B))
  for num=1:size(B)[1]
      if B[!, :Team][num] == teams[1]
          if B[!, :Position][num] == "QB"
              pair_info[num] = A[3,16]
          elseif B[!, :Position][num] == "WR"
              WRval = findall(( B[!, :Position] .== "WR") .& (B[!, :Team] .== teams[num] ))
              WR_index = [A[12,3], A[11,3], A[10,3], A[10,3], A[10,3]]
              for i in eachindex(WRval)
                  pair_info[WRval[i]] = WR_index[i]
              end
          elseif B[!, :Position][num] == "RB"
              RBval = findall(( B[!, :Position] .== "RB") .& (B[!, :Team] .== teams[num] ))
              RB_index = [A[15,3], A[14,3], A[14,3]]
              for i in eachindex(RBval)
                  pair_info[RBval[i]] = RB_index[i]
              end
          end
      end
 end
 team_pairs = hcat(pair_info)

But I get a method error:

MethodError: no method matching zeros(::Type{Float64}, ::DataFrame)
Closest candidates are:
  zeros(::Type{T}, !Matched::Union{Integer, AbstractUnitRange}...) where T at array.jl:500
  zeros(::Type{T}, !Matched::Tuple{}) where T at array.jl:509
  zeros(::Type{T}, !Matched::Tuple{Vararg{Integer,N}}) where {T, N} at array.jl:504
  ...

Stacktrace:
 [1] top-level scope at In[258]:1

Essentially, I am not quite sure how to structure the conditional if, elseif statements correctly with the findall index code to achieve the results I want. Could someone kindly point me in the right direction? Thanks very much!

jling · September 4, 2020, 10:37pm

well the method error really happened on the first line didn’t it…

zeros(Float64, size(B))

would work

yold6 · September 4, 2020, 10:43pm

Nice catch jling!

Making that adjustment, results in the following array of zeros.

28×7 Array{Float64,2}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 ⋮                        ⋮    
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0

It appears an array is now being generated, but not being populated.

Topic		Replies	Views
DataFrame array subset General Usage array	18	559	August 11, 2020
Picking elements of an array New to Julia array	7	1555	November 22, 2022
Creating sub-arrays by comparing multiple arrays DataFrames General Usage array , dataframes	4	601	April 26, 2019
How can I compare every values in an array and put the new values in new arry New to Julia question	12	2276	December 7, 2020
Usage of if inside arrays New to Julia	9	822	December 4, 2019

Generate new array using conditional selections from existing arrays

Related topics