Building sets of matching entries in a dataset to explore interaction effects

BryanFrost25 · November 8, 2023, 12:30pm

Here’s the relevant code from a toy problem I’m working with:
‘’‘#Inputs
data = [1 1;
1 1;
0 1;
0 1;
0 0;
1 0;
0 0;
0 0]
C = [“A”,“B”];
Interactions = [[1,2],[5,7,8]]’‘’

I’m trying to build sets of indices corresponding to duplicate entries in data. Because rows 1 and 2 of data are all ones, row 1 of Interactions needs to be [1,2] and row 2 needs to be [7, 8] as they are all zeros.
I’m trying to write the code to scale up to include more columns and higher order interactions. For instance, in a dataset with 3 columns it should build of set of indices for entries with ones in the same two columns, zeros in the same two columns, ones in all three columns, and zeros in all there columns. Any help is appreciated!

bkamins · November 8, 2023, 10:36pm

I am not fully clear what you want. Why you do not consider rows 3 and 4 as duplicates?

BryanFrost25 · November 9, 2023, 2:32am

I am writing an optimization problem with lots of constraints, and I figured that only finding duplicates of 1s and 0s would be less computationally demanding.

That being said, finding rows 3 and 4 as duplicates should still produce the same result while the problem is still small. Perhaps narrowing it down to the ones and zeros is a later step to take.

bkamins · November 9, 2023, 7:42am

julia> using SplitApplyCombine

julia> groupfind(eachrow(data))
4-element Dictionaries.Dictionary{SubArray{Int64, 1, Matrix{Int64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, Vector{Int64}}
 [1, 1] │ [1, 2]
 [0, 1] │ [3, 4]
 [0, 0] │ [5, 7, 8]
 [1, 0] │ [6]

rocco_sprmnt21 · November 10, 2023, 1:33pm

julia> function indmap(v)
           dd=Dict{Array{Int64}, Array{Int64}}()
           for  i in eachindex(v)        
               if v[i][1] == v[i][2]     
                   push!(get!(()->Int[],dd,v[i]),i)
               end
           end
           dd
       end
indmap (generic function with 1 method)  

julia> indmap(eachrow(data))
Dict{Array{Int64}, Array{Int64}} with 2 entries:
  [0, 0] => [5, 7, 8]
  [1, 1] => [1, 2]

An ambitious attempt to generalize


function indmap(v, pred=(_)->true)
    dd=Dict{Array{Int64}, Array{Int64}}()
    for  i in eachindex(v)
        if pred(v[i])
            push!(get!(()->Int[],dd,v[i]),i)
        end
    end
    dd
end

indmap(eachrow(data))
indmap(eachrow(data), r->allequal(r))

Topic		Replies	Views
How to efficiently find columns of the matrix which are the same? New to Julia question , optimization	10	1160	November 7, 2023
Return duplicate rows in array with no of times and index of first occurence General Usage question , array	6	705	July 13, 2022
Multiple condition findall in dataframes New to Julia dataframes	5	926	July 12, 2021
How to speed up the for-loop with dataframe access Performance dataframes	25	1165	April 14, 2022
Checking for unique rows in classification New to Julia dataframes	4	584	August 11, 2022

Building sets of matching entries in a dataset to explore interaction effects

Related topics