Randomized Hypothesis Test (row-level analysis): DimensionMismatch ERROR

Hello Everyone,

Using = Pluto 0.17.*

using Combinatorics, StatsBase

Am working with the following DF

P1 = DataFrame(Col1 = rand(1:5:100,10), Col2 = rand(1:3:150,10), Col3=rand(1:4:250,10)

and another dataframe where the rows
represent the Mean of P1.

P2 = DataFrame(Mean = [100,120,125,115,120,110,100,115,130,120]...)

I imported and stored my control and series groups

control = P2.Mean
Col1 = collect(P1[1,:])
Col2 = collect(P1[2,:])
Col3 = collect(P1[3,:])

I then concatenated and combined all observations into one array

subGroups = collect(combinations([control;Col1; Col2;Col3],3))

3 represents the number of observations (across the columns)

I calculated the mean for all my records

meanCol1 = mean(Col1)
meanCol2 = mean(Col2)
meanCol3 = mean(Col3)

I encounter an error when I attempt to
find the p-value for each record using

pVAL1 = sum([mean(i) >= meanCol1 for i in subGroups])/ length(subGroups)
pVAL2 = sum([mean(i) >= meanCol2 for i in subGroups])/length(subGroups)
pVAL3 = sum([mean(i) >= meanCol3 for i in subGroups])/length(subGroups)

The error reads:

DimensionMismatch("dimensions must match: a has dims (Base.OneTo(10),), b has dims (Base.OneTo(3),), mismatch at 1")

If you need the Stacktrace let me know.
Any suggestions?


help?> combinations
search:

Couldn't find combinations
Perhaps you meant combine
  No documentation found.

  Binding combinations does not exist.

Where does combinations come from? Hard to understand issue without an MWE.

MethodError: no method matching +(::Float64, ::DataFrames.DataFrameRow{DataFrames.DataFrame, DataFrames.Index})

This is an intuitive if you are familiar with the fact that 1 + [1, 2, 3] errors. In Julia, you can’t add scalars and vectors. Instead, you have to broadcast the operation, 1 .+ [1, 2, 3]. The same applies for DataFrameRows.

I did not include:

using Combinatorics, StatsBase

Also, might you also suggest converting
my DF into a matrix to avoid
broadcasting?

No, that will not fix your problem because you need broadcasting for array operations as well, as I stated above

julia> subGroups = collect(combinations([control;Col1, Col2, Col3],6))
       
ERROR: syntax: unexpected comma in matrix expression
Stacktrace:
 [1] top-level scope
   @ none:1

This code errors as well. It’s important you provide a working MWE, preferrably as a single block of code, to get the best help.

I provided the exact implementation of code above.
This is not a MWE?

Your code errors in a way that is different than the error you posted in the first post. So it is not an MWE because it doesn’t reproduce the error you would like help with.

Understood.

I re-wrote my original post. And added
P2 values to represent the mean of
each record in P1.

Is this better?

No. Read my message above. The problem is with the line

julia> subGroups = collect(combinations([control;Col1, Col2, Col3],6))

Did you run this code before you posted it?

Oh okay!

The semi-colon needs to be changed to a (,).
I will make this change above now.

I believe you have your indexing wrong in creating your Col1, Col2, and Col3 objects.

Do get a column you need to do

Col1 = P1[:,1]

not

Col1 = P1[1,:]

which is what you are currently doing. Your current indexing selects the first row, not the first column.

Yes – I am analyzing across the rows in this
example. (I will add to the example)

I need to convert the DataFrameRow type into
a vector.

I found this thread

Just use collect.

julia> Col3 = P1[3,:]
DataFrameRow
 Row β”‚ Col1   Col2   Col3  
     β”‚ Int64  Int64  Int64 
─────┼─────────────────────
   3 β”‚    16     64      5

julia> collect(Col3)
3-element Vector{Int64}:
 16
 64
  5

@pdeffebach

The issue I am encountering now stems from
the control = P2.Mean, step. When I apply the
collect() method, I get the same error. I also
attempted to (.) broadcast, but am still seeing
the DimensionMismatch Error.

Any suggestions?

I cannot debug your program line-by-line for you, unfortunately.

I suggest you work through more Julia tutorials to learn how to better navigate these errors.

1 Like

@pdeffebach

Thanks for your time and sharing the resource.

I did not want you to debug the code. But
the solutions you presented created NEW
challenges that no longer were of my
construction, but rather yours.

Solution:

  1. Reconstruct the DF from the source so that
    it was not a cross-tab but single-attribute DF.
  2. I used semi-colons in the combination()
    method to concatenate all array elements.