# How to leave only unique rows of array

These are two columns of a string matrix. How to leave only unique rows of these 12 array rows? :

``````Kalinka surname
Kalinka surname
Kalinka surname
Kalinka geographical_name
Kalinka geographical_name
Kalinka geographical_name
Kalinka geographical_name
Kalinko surname
Kalinko surname
Kalinka name
Kalinka name
``````

See Unique (indices) method similar to MATLAB â€” for example, if you have a matrix `a` and you want the rows corresponding to whether columns 1 and 2 are unique, you could do:

``````julia> a = [ "Kalinka"  "surname"
"Kalinka"  "surname"
"Kalinka"  "surname"
"Kalinka"  "geographical_name"
"Kalinka"  "geographical_name"
"Kalinka"  "geographical_name"
"Kalinka"  "geographical_name"
"Kalinko"  "surname"
"Kalinko"  "surname"
"Kalinka"  "name"
"Kalinka"  "name"];

julia> idx = unique(i -> (a[i,1], a[i,2]), axes(a, 1))
4-element Vector{Int64}:
1
4
8
10

julia> a[idx,:] # rows where (col1, col2) are unique
4Ă—2 Matrix{String}:
"Kalinka"  "surname"
"Kalinka"  "geographical_name"
"Kalinko"  "surname"
"Kalinka"  "name"
``````

Also, you can consider using DataFrames:

``````using DataFrames
using CSV

s = """name cname
Kalinka surname
Kalinka surname
Kalinka surname
Kalinka geographical_name
Kalinka geographical_name
Kalinka geographical_name
Kalinka geographical_name
Kalinko surname
Kalinko surname
Kalinka name
Kalinka name"""

f = IOBuffer(s)

df = CSV.read(f, DataFrame)
``````

Now `unique` drops repeating lines:

``````julia> unique(df)
4Ă—2 DataFrame
Row â”‚ name     cname
â”‚ String7  String31
â”€â”€â”€â”€â”€â”Ľâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
1 â”‚ Kalinka  surname
2 â”‚ Kalinka  geographical_name
3 â”‚ Kalinko  surname
4 â”‚ Kalinka  name
``````

Big Thx, But apart from the columns I showed, I have others columns that I cannot lose. How to select entire unique rows of columns 2 and 3 from a 3-column matrix. Another way is how to get the numbers of unique rows mat[:,2:3]? julia> mat
20Ă—3 Matrix{Any}:
â€śaaâ€ť â€śaaâ€ť â€śâ€ť
â€śAachenâ€ť â€śAachenâ€ť â€śnazwa_geograficznaâ€ť
â€śAAPâ€ť â€śAAPâ€ť â€śnazwa_instytucjiâ€ť
â€śAaronaâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronachâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronamiâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronemâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaroniâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronieâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronomâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronowiâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronowieâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronĂłwâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śAaronyâ€ť â€śAaronâ€ť â€śimiÄ™â€ť
â€śaaronowaâ€ť â€śaaronowyâ€ť â€śâ€ť
â€śaaronowÄ…â€ť â€śaaronowyâ€ť â€śâ€ť
â€śaaronoweâ€ť â€śaaronowyâ€ť â€śâ€ť
â€śaaronowegoâ€ť â€śaaronowyâ€ť â€śâ€ť
â€śaaronowejâ€ť â€śaaronowyâ€ť â€śâ€ť

I really recommend you use DataFrames.jl for this task. Is there a reason you want to keep it a matrix?

1 Like
``````stack(unique(e->(e[2],e[3]),eachrow(m)),dims=1)
``````
``````function runique(m, cols)
idx=Vector{Int}(undef,size(m, 1))
u=NaN
ui=1
for r in axes(m, 1)
if u!=    @view m[r,cols]
idx[ui]=r
ui+=1
u=    @view m[r,cols]
end
end
@views m[idx[1:ui-1],:]
end
``````
``````using DataFrames

df=DataFrame(m,:auto)

combine(first,groupby(df,[:x2,:x3]))
``````

but the best, as many suggest, is

``````unique(df,[:x2,:x3])
``````