Indexing arrays of strings (finding duplicates)

Hello all,

I’m looking to find duplicates within an array (an exercise from the Think Julia text).
I’ve got it working for numerical values, but not sure how to generalize to strings as well. Parse? Find?

function hasduplicates(arr)
    arr2 = arr[:]

    if typeof(arr2[1]) == String
        for c in arr2
            arr2[c] = typeof(parse(Int64, arr2[c]))

        end
    end

    sort(arr2)

    for c in arr2
        if c < length(arr2) && arr2[c] ≡ arr2[c+1]
            return true
        end
    end
    false
end

hasduplicates(["hello", "goodbye"])

nextind might be suitable to search more elements in strings or arrays.

function hasdup(xs::AbstractVector)
  found = Set{Int}()
  for x in xs
    x = toint(x)
    x in found && return true
    push!(found, x)
  end
  false
end

 toint(x::Int) = x
 toint(x::String) = parse(Int, x)

I’m getting errors to do with my use of < in my if statement. I might also be falling off the edge, but not sure how to handle the boolean expression first.

What do you mean by typeof(arr2[1]) == String?
It seems that you didn’t properly consider the exercise. Local variable c here is likely to have String type(at least for arr2[1]), and you index an array with a string, which will constantly raise a TypeError.

What about the following approach?

function hasduplicates(arr::AbstractVector{Int64})
    # your original implementation
end

function hasduplicates(arr::AbstractVector{String}
    hasduplicates(parse.(Int64, arr))
end
2 Likes

Yeah, I guess I’d add an if statement to determine if the elements are strings then parse them to allow the equivalency statement to work.

Thank you, makes much more sense!

Edit: the typeof(parse()) doesn’t make any sense for sure, just frustratingly copied some rando code I found!

If you have an array of strings and all you cared about is finding duplicates then you can do the following

julia> arr = ["apple","banana","cherry","banana"]
4-element Array{String,1}:
 "apple"
 "banana"
 "cherry"
 "banana"

julia> sorted_arr = sort(arr)
4-element Array{String,1}:
 "apple"
 "banana"
 "banana"
 "cherry"

julia> for k = 2:length(sorted_arr)
         if sorted_arr[k-1] == sorted_arr[k]
             println("Found a duplicate string at $(k-1)")
         end
       end
Found a duplicate string at 2
1 Like

Why not just hasduplicates(xs) = length(xs) != length(Set(xs))?

2 Likes

Cuz I dumb

    sort(arr2)

    for c in arr2
     :

Note also that arr2 will not be sorted. Use either sort!(arr2) or arr3 = sort(arr2).

Well, you could always do

hasduplicates(xs) = !allunique(xs)

This should be faster than length(xs) != length(Set(xs)) too, since it would bail out early.

I don’t understand much of your code, though. It looks like it wouldn’t work for any strings except numerical strings, since you are doing parse(Int, ...). Also, this

Here, c is an element of arr2, but then you use it as an index?

2 Likes