How to leave only unique rows of array

These are two columns of a string matrix. How to leave only unique rows of these 12 array rows? :

Kalinka surname
Kalinka surname
Kalinka surname
Kalinka geographical_name
Kalinka geographical_name
Kalinka geographical_name
Kalinka geographical_name
Kalinko surname
Kalinko surname
Kalinka name
Kalinka name

See Unique (indices) method similar to MATLAB — for example, if you have a matrix a and you want the rows corresponding to whether columns 1 and 2 are unique, you could do:

julia> a = [ "Kalinka"  "surname"
        "Kalinka"  "surname"
        "Kalinka"  "surname"
        "Kalinka"  "geographical_name"
        "Kalinka"  "geographical_name"
        "Kalinka"  "geographical_name"
        "Kalinka"  "geographical_name"
        "Kalinko"  "surname"
        "Kalinko"  "surname"
        "Kalinka"  "name"
        "Kalinka"  "name"];

julia> idx = unique(i -> (a[i,1], a[i,2]), axes(a, 1))
4-element Vector{Int64}:
  1
  4
  8
 10

julia> a[idx,:] # rows where (col1, col2) are unique
4×2 Matrix{String}:
 "Kalinka"  "surname"
 "Kalinka"  "geographical_name"
 "Kalinko"  "surname"
 "Kalinka"  "name"

Also, you can consider using DataFrames:

using DataFrames
using CSV

s = """name cname
Kalinka surname
Kalinka surname
Kalinka surname
Kalinka geographical_name
Kalinka geographical_name
Kalinka geographical_name
Kalinka geographical_name
Kalinko surname
Kalinko surname
Kalinka name
Kalinka name"""

f = IOBuffer(s)

df = CSV.read(f, DataFrame)

Now unique drops repeating lines:

julia> unique(df)
4×2 DataFrame
 Row │ name     cname             
     │ String7  String31          
─────┼────────────────────────────
   1 │ Kalinka  surname
   2 │ Kalinka  geographical_name
   3 │ Kalinko  surname
   4 │ Kalinka  name

Big Thx, But apart from the columns I showed, I have others columns that I cannot lose. How to select entire unique rows of columns 2 and 3 from a 3-column matrix. Another way is how to get the numbers of unique rows mat[:,2:3]? julia> mat
20×3 Matrix{Any}:
“aa” “aa” “”
“Aachen” “Aachen” “nazwa_geograficzna”
“AAP” “AAP” “nazwa_instytucji”
“Aarona” “Aaron” “imię”
“Aaron” “Aaron” “imię”
“Aaronach” “Aaron” “imię”
“Aaronami” “Aaron” “imię”
“Aaronem” “Aaron” “imię”
“Aaroni” “Aaron” “imię”
“Aaronie” “Aaron” “imię”
“Aaronom” “Aaron” “imię”
“Aaronowi” “Aaron” “imię”
“Aaronowie” “Aaron” “imię”
“Aaronów” “Aaron” “imię”
“Aarony” “Aaron” “imię”
“aaronowa” “aaronowy” “”
“aaronową” “aaronowy” “”
“aaronowe” “aaronowy” “”
“aaronowego” “aaronowy” “”
“aaronowej” “aaronowy” “”

I really recommend you use DataFrames.jl for this task. Is there a reason you want to keep it a matrix?

1 Like
stack(unique(e->(e[2],e[3]),eachrow(m)),dims=1)
function runique(m, cols)
    idx=Vector{Int}(undef,size(m, 1))
    u=NaN
    ui=1
    for r in axes(m, 1)
        if u!=    @view m[r,cols] 
            idx[ui]=r
            ui+=1
            u=    @view m[r,cols]
        end
    end
    @views m[idx[1:ui-1],:]
end
using DataFrames

df=DataFrame(m,:auto)

combine(first,groupby(df,[:x2,:x3]))

but the best, as many suggest, is

unique(df,[:x2,:x3])