I am trying to process some string data - each element of the array should be split up into individual words and then combined. At the end, I only want the unique words in the array. In R, I would just run a loop and use c() to add to the array at each iteration. My code is:
mystrs = ["SWT1;SWT1;LPT","ABC;ABC| LPT; NYP","ABCD ;ABC|PT; NYP" ]
function fsep(tstr,splits = (';','|',','))
tstr2 = split(tstr,splits)
tstr3 = sort(unique(map(strip,tstr2)))
return tstr3
end
myset = []
for i = 1:3
tgenes = fsep(mystrs[i])
push!(myset,tgenes)
end
uset = unique(myset)
I do have a very large collection with significant overlaps, so I am keen to implement the second solution too. I can do the first part and get a set, but it complains that map cannot be used in sets (to strip whitespace):
res3 = mapreduce(x -> Set(split(x, (';','|'))),union!, mystrs)
res4 = sort(map(strip,res3))
julia> res4 = sort(map(strip,res3))
ERROR: map is not defined on sets
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] map(::Function, ::Set{SubString{String}}) at ./abstractarray.jl:2101
[3] top-level scope at none:0
I tried to use Parse but get an error too:
julia> res3x = Parse(string,res3)
ERROR: UndefVarError: Parse not defined
Stacktrace:
[1] top-level scope at none:0
How do I get the result from ‘set’ back again to ‘String’?
Note that since I was splitting also on the ' ' character (space), there is no need to call strip (which strips whitespace) at the end (you will simply end up with one member of the set that is an empty string "").
That said, you can do strip without map.