# Indexing arrays of strings (finding duplicates)

Hello all,

I’m looking to find duplicates within an array (an exercise from the Think Julia text).
I’ve got it working for numerical values, but not sure how to generalize to strings as well. Parse? Find?

``````function hasduplicates(arr)
arr2 = arr[:]

if typeof(arr2[1]) == String
for c in arr2
arr2[c] = typeof(parse(Int64, arr2[c]))

end
end

sort(arr2)

for c in arr2
if c < length(arr2) && arr2[c] ≡ arr2[c+1]
return true
end
end
false
end

hasduplicates(["hello", "goodbye"])
``````

`nextind` might be suitable to search more elements in strings or arrays.

``````function hasdup(xs::AbstractVector)
found = Set{Int}()
for x in xs
x = toint(x)
x in found && return true
push!(found, x)
end
false
end

toint(x::Int) = x
toint(x::String) = parse(Int, x)
``````

I’m getting errors to do with my use of `<` in my if statement. I might also be falling off the edge, but not sure how to handle the boolean expression first.

What do you mean by `typeof(arr2[1]) == String`?
It seems that you didn’t properly consider the exercise. Local variable `c` here is likely to have `String` type(at least for `arr2[1]`), and you index an array with a string, which will constantly raise a TypeError.

``````function hasduplicates(arr::AbstractVector{Int64})
end

function hasduplicates(arr::AbstractVector{String}
hasduplicates(parse.(Int64, arr))
end
``````
2 Likes

Yeah, I guess I’d add an if statement to determine if the elements are strings then parse them to allow the equivalency statement to work.

Thank you, makes much more sense!

Edit: the typeof(parse()) doesn’t make any sense for sure, just frustratingly copied some rando code I found!

If you have an array of strings and all you cared about is finding duplicates then you can do the following

``````julia> arr = ["apple","banana","cherry","banana"]
4-element Array{String,1}:
"apple"
"banana"
"cherry"
"banana"

julia> sorted_arr = sort(arr)
4-element Array{String,1}:
"apple"
"banana"
"banana"
"cherry"

julia> for k = 2:length(sorted_arr)
if sorted_arr[k-1] == sorted_arr[k]
println("Found a duplicate string at \$(k-1)")
end
end
Found a duplicate string at 2
``````
1 Like

Why not just `hasduplicates(xs) = length(xs) != length(Set(xs))`?

1 Like

Cuz I dumb

``````    sort(arr2)

for c in arr2
:
``````

Note also that `arr2` will not be sorted. Use either `sort!(arr2)` or `arr3 = sort(arr2)`.

Well, you could always do

``````hasduplicates(xs) = !allunique(xs)
``````

This should be faster than `length(xs) != length(Set(xs))` too, since it would bail out early.

I don’t understand much of your code, though. It looks like it wouldn’t work for any strings except numerical strings, since you are doing `parse(Int, ...)`. Also, this

Here, `c` is an element of `arr2`, but then you use it as an index?

1 Like