# Building sets of matching entries in a dataset to explore interaction effects

Here’s the relevant code from a toy problem I’m working with:
‘’‘#Inputs
data = [1 1;
1 1;
0 1;
0 1;
0 0;
1 0;
0 0;
0 0]
C = [“A”,“B”];
Interactions = [[1,2],[5,7,8]]’‘’

I’m trying to build sets of indices corresponding to duplicate entries in data. Because rows 1 and 2 of data are all ones, row 1 of Interactions needs to be [1,2] and row 2 needs to be [7, 8] as they are all zeros.
I’m trying to write the code to scale up to include more columns and higher order interactions. For instance, in a dataset with 3 columns it should build of set of indices for entries with ones in the same two columns, zeros in the same two columns, ones in all three columns, and zeros in all there columns. Any help is appreciated!

I am not fully clear what you want. Why you do not consider rows 3 and 4 as duplicates?

I am writing an optimization problem with lots of constraints, and I figured that only finding duplicates of 1s and 0s would be less computationally demanding.

That being said, finding rows 3 and 4 as duplicates should still produce the same result while the problem is still small. Perhaps narrowing it down to the ones and zeros is a later step to take.

``````julia> using SplitApplyCombine

julia> groupfind(eachrow(data))
4-element Dictionaries.Dictionary{SubArray{Int64, 1, Matrix{Int64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, Vector{Int64}}
[1, 1] │ [1, 2]
[0, 1] │ [3, 4]
[0, 0] │ [5, 7, 8]
[1, 0] │ [6]
``````
2 Likes
``````julia> function indmap(v)
dd=Dict{Array{Int64}, Array{Int64}}()
for  i in eachindex(v)
if v[i][1] == v[i][2]
push!(get!(()->Int[],dd,v[i]),i)
end
end
dd
end
indmap (generic function with 1 method)

julia> indmap(eachrow(data))
Dict{Array{Int64}, Array{Int64}} with 2 entries:
[0, 0] => [5, 7, 8]
[1, 1] => [1, 2]
``````

An ambitious attempt to generalize

``````
function indmap(v, pred=(_)->true)
dd=Dict{Array{Int64}, Array{Int64}}()
for  i in eachindex(v)
if pred(v[i])
push!(get!(()->Int[],dd,v[i]),i)
end
end
dd
end

indmap(eachrow(data))
indmap(eachrow(data), r->allequal(r))

``````