Finding sub-arrays in an array?

sophibug · April 13, 2026, 4:50pm

I’m putting this here because this is a chemistry usage but it’s not really a chemistry-specific case.

I have an array that has a bunch of states formatted like this:

1.0 1.0 0.0 1.0 0.0 1.0 6380.35
1.0 1.0 1.0 1.0 0.0 1.0 6316.91
2.0 1.0 1.0 2.0 0.0 2.0 6444.27
2.0 1.0 2.0 1.0 0.0 1.0 11086.5

For any given line, 1:3 is a description of one state, 4:6 is the description of a second state, and 7 is the frequency of the difference.

I have created a list of all the states (which is all the values 1:3 and 4:6 listed in an nx3 array) and of all the unique states, and I need to identify which states only appear once in the list. However,

findall(isequal(unst[1,:]), states[1:3,:])

Is not working – it provides an empty Cartesian index

CartesianIndex{2}

Preferably, I would be able to find the index of every unique state in the overall nx7 list. Does anyone know how to do this?

Thanks!

mikmoore · April 13, 2026, 5:07pm

The issue is that isequal operates on the entire array, rather than columnwise like you seem to want. SInce the entire matrix does not match unst[1,:], you get a single false as a result.

Does

findall(isequal(unst[1,:]), eachcol(@view(states[1:3, :]))) # @view is optional

do what you want? This should slice the matrix into each 3-tall column and compare each of those to unst[1, :].

For example,

julia> findall(isequal([1;2;3]), eachcol([1;1;1;; 1;2;3;; 3;2;1;; 2;2;2;;]))
1-element Vector{Int64}:
 2

sophibug · April 13, 2026, 7:58pm

I’m actually trying to match rows, not columns. States is formatted like

and unst is formatted like

mikmoore · April 13, 2026, 8:25pm

The rest of my earlier comment is likely still relevant except that you might need to use eachrow(states[:, 1:3]) instead of eachcol(states[1:3, :]).

In general, the NumPy/MATLAB/etc pattern of smooshing all your data into a mega-array and then slicing-and-dicing it during processing is not necessary in Julia (or other languages where non-arrays are performant). Where relevant, you might consider using types to organize your data a little more carefully. For example:

julia> struct StateDescriptor
           state1::NTuple{3, Int} # maybe even make a special type for these
           state2::NTuple{3, Int}
           freq_diff::Float64
       end

julia> bunch_of_states = [StateDescriptor((1,1,0), (1,0,1), 6380.35), StateDescriptor((1,1,1), (1,0,1), 6316.91), StateDescriptor((2,1,1), (2,0,2), 6444.27), StateDescriptor((2,1,2), (1,0,1), 11086.5)]
4-element Vector{StateDescriptor}:
 StateDescriptor((1, 1, 0), (1, 0, 1), 6380.35)
 StateDescriptor((1, 1, 1), (1, 0, 1), 6316.91)
 StateDescriptor((2, 1, 1), (2, 0, 2), 6444.27)
 StateDescriptor((2, 1, 2), (1, 0, 1), 11086.5)

julia> all_states = vcat([x.state1 for x in bunch_of_states], [x.state2 for x in bunch_of_states])
8-element Vector{Tuple{Int64, Int64, Int64}}:
 (1, 1, 0)
 (1, 1, 1)
 (2, 1, 1)
 (2, 1, 2)
 (1, 0, 1)
 (1, 0, 1)
 (2, 0, 2)
 (1, 0, 1)

julia> unique_states = unique(all_states)
6-element Vector{Tuple{Int64, Int64, Int64}}:
 (1, 1, 0)
 (1, 1, 1)
 (2, 1, 1)
 (2, 1, 2)
 (1, 0, 1)
 (2, 0, 2)

julia> findall(isequal(unique_states[1]), x.state1 for x in bunch_of_states) # (1, 1, 0) in state1
1-element Vector{Int64}:
 1

julia> findall(isequal(unique_states[5]), x.state2 for x in bunch_of_states) # (1, 0, 1) in state2
3-element Vector{Int64}:
 1
 2
 4

Done this way, there isn’t even a question about rows versus columns. I’m sure you’d want to make some adjustments to what I’ve suggested based on your full use case, but in general I find data/code like this much easier to reason about.

sophibug · April 13, 2026, 9:23pm

Interesting! I have it in an array like this because it’s read in from a file, but I could do some data manipulation like that before I got to this point.

Thanks for your help; that worked.

Dan · April 14, 2026, 12:05am

If the intention was to view the list as a list of possible transitions between states, and furthermore to find the states which appear once and only once in the transition list, then maybe the following can help:

# run the code in mikmoore's post...
using StatsBase

cm = countmap(all_states)
# Dict{Tuple{Int64, Int64, Int64}, Int64} with 6 entries:
#   (1, 1, 1) => 1
#  (1, 0, 1) => 3
# ...

only_once_state_indices =  # indices inside `all_states`
  [i for (i,s) in enumerate(all_states) if last(cm[s])==1]
# 5-element Vector{Int64}:
#  1
#  2
# ...

only_once_state_transitions = # indices in transition list
  [i for (i,s) in enumerate(bunch_of_states) 
    if ( last(cm[s.state1])==1 || last(cm[s.state2])==1 ) ]
# 4-element Vector{Int64}:
#  1
#  2
#  3
#  4
# apparantly, each transition has one state which appears only once

Also, it would be better to rename bunch_of_states to bunch_of_transitions.

kapple · April 14, 2026, 5:56am

As an alternative to using structs you might find it useful to load data to a DimStack of DimArrays from DimensionalData.jl, depending on what other things you will be doing with your data.

aplavin · April 14, 2026, 3:50pm

Assuming the file is in any of the common formats, try reading it with the corresponding julia package – eg, both delimited and fixed-width textual formats are well-supported in Julia.

Then, you will immediately have 1d-array-of-namedtuples instead of 2d-array, and these are very convenient and efficient to manipulation in Julia. Defining custom structs or using fancy array wrappers can be useful, but those are generally further potential steps – you already get great usability and performance from just namedtuples all the way

Topic		Replies	Views
How to efficiently find columns of the matrix which are the same? New to Julia question , optimization	10	1343	November 7, 2023
Choosing only different vectors from a matrix New to Julia	8	595	September 1, 2019
Unique rows indexes in array General Usage	14	2982	September 15, 2020
Count occurances for matrix rows (where column order does not matter) General Usage question , count	30	1118	December 13, 2022
Fastest way possible to find index of value equals 1 across a matrix column Performance question , performance , speed-optimization	7	1091	October 6, 2023

Finding sub-arrays in an array?

Related topics