Unique rows indexes in array

julia> baza2
45805×9 Array{String,2}:
 "0"      "000001"  "1"   "1"
 "1"      "000002"  ""    "0"
 "2"      "000002"  ""    "0"

julia> ia = unique(i ->  baza2[:,6][i], 1:length( baza2[:,6]))
40378-element Array{Int64,1}:
     1
     2
     4

is ok .
But if i need unique rowes from many cols:

julia> ia = unique(i -> baza2[:,5:8][i], 1:size(baza2[:,5:8],1),dims=1)
ERROR: MethodError: no method matching unique(::var"#19#20", ::UnitRange{Int64}; dims=1)
Closest candidates are:

Som idea?
Paweł

baza2 = ["0" "000001" "1" "1";
          "1" "000002" "" "0";
          "2" "000002" "" "0";
          "0" "000001" "1" "1"]

julia> ia = unique(baza2, dims=1)
3×4 Array{String,2}:
 "0"  "000001"  "1"  "1"
 "1"  "000002"  ""   "0"
 "2"  "000002"  ""   "0"

FYI, this is documented and you can find it by typing ? into your REPL and you’ll get:

 unique(A::AbstractArray; dims::Int)

  Return unique regions of A along dimension dims.

If you need the unique values, @mthelm85′s solution is best. From your example, it looks like you were looking for the indices of the unique values, in which case you might want a function like

function unique_inds(v)
    unq = Set{eltype(v)}()
    inds = Int[]
    for (i, x) in pairs(v)
       if x ∉ unq
           push!(inds, i)
           push!(unq, x)
       end
   end
   inds
end
julia> baza2 = ["0" "000001" "1" "1";
                 "1" "000002" "" "0";
                 "2" "000002" "" "0";
                 "0" "000001" "1" "1"];

julia> unique_inds(baza2[:, 1])
3-element Array{Int64,1}:
 1
 2
 3

julia> unique_inds.(eachrow(baza2))
4-element Array{Array{Int64,1},1}:
 [1, 2, 3]
 [1, 2, 3, 4]
 [1, 2, 3, 4]
 [1, 2, 3]
1 Like

Thanks, But I need list of rows … ia as number of rows.
Paul

Here you make a slice and then pick out an element. That’s really wasteful, performance-wise. Just write

baza2[i, 6] 

Here you create another slice. You should write

size(baza2, 1)

instead.

1 Like

Any reason you aren’t using a DataFrame for this data? Looks like a table-style data structure.

No DataFrame , only Array{String,2}~

Yes , size=1. But not works.

No, I wasn’t telling you how to fix the error. I was saying, don’t write baza2[:, 5:8][1], it is very wasteful. You should write baza2[i, 5:8]. And don’t write size(baza2[:,5:8],1), write size(baza2,1) instead.

It will still give an error in your particular example. I’m just telling you how to access elements and how to calculate the size of arrays.

2 Likes

Yes, but problem is in another place:


julia>  ia = unique(i -> baza2[:,5][i], 1:size(baza2,1))
8839-element Array{Int64,1}:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10

julia>  ia = unique(i -> baza2[:,5:8][i], 1:size(baza2[:,5:8],1),dims=1)
ERROR: MethodError: no method matching unique(::var"#29#30", ::UnitRange{Int64}; dims=1)
Closest candidates are:
  unique(::Any, ::Any) at set.jl:178 got unsupported keyword argument "dims"

Do you insists on indexing this way? :laughing:

The error says you are using a keyword dims not supported by unique. Your signature is

unique(::Function, ::Vector, dims = 1)

which does not exist. You can either do

unique(::Function, ::Vector)

or

unique(::Array, dims = 1)

At this stage though, it might help to take a look at the julia manual to get a sense of the sorts of things you’re missing.

1 Like

Yes, it is in another place. I’m just saying, please stop writing baza2[:,5][i], it is hurting my eyes. Write baza2[i, 5] instead. It will not fix your problem, but it will still be better.

4 Likes

It is my understanding that programista first language is not english. It may be hard to get through the language barrier.

Yes, we have interacted before. I’m just trying to be clear.

(@programista has also been around for several years, and created many posts and topics, so isn’t a newbie.)

1 Like

Big Thanks! All is clear!