For a vector v, the R function duplicated works as follows. The vector duplicated(v) has the same length as v, and its i-th element is false if and only if v[i] is the first occurence of v[i] in v. For example duplicated([1, 2, 1, 3, 2]) = [false, false, true, false, true]. I implemented it as follows in Julia:
function duplicated(x)
out = fill(false, length(x))
for i in 1:(length(x)-1)
if !out[i]
out[i .+ findall(x[i] .== x[(i+1):length(x)])] .= true
end
end
return out
end
To remove the duplicates of a vector you can do v[!duplicated(v)]. Well, in this case this is equivalent to unique(v), but this can be used for another vector: x[!duplicated(v)] (useful for example v = score.(x) for a function score).
Didn’t know that, thanks. But I use it for removing the rows of a matrix which have the same “score”: x[!duplicated([score(row) for row in eachrow(x)]), :].
Wouldn’t it be rather \mathcal{O}(n\cdot k) with n the length of the input and k the number of its unique elements? Which is n^2 if all elements are unique.
You have n to loop over the values. For each value you do a set lookup and a possible set insertion (and setting the boolean value in out, which is clearly constant time). I mentioned log n thinking that a set might be a binary tree. If instead a set is implemented as a hash (likely), then insertion and lookup could be constant time, so its possibly (probable) just O(n). I’m not at a computer so it’s not convenient to lookup the set implementation details.