Hi everyone,
I have the following problem:
I have a list of long strings and a list of short strings. My goal: find which short strings are found within which long strings. for this i need to go through all short string- long string pairings. I then also need to store some information of these pairings such as which short strings were found in a long strings and the coordinated of these substrings
My approach:
1.) create a mutable struct long_string_object
mutable struct long_string_object
long_string_identifier::SubString{String}
long_string_sequence::SubString{String}
found_short_strings::Vector{String}
found_short_strings_coordinates::Vector{UnitRange{Int64}}
short_string_coverage
end
2.) write a function to test a short string-long string pairing
function map_short_string_to_long_string(short_string, long_string_object)
if(occursin(short_string, long_string_object.long_string_sequence))
finder = findall(short_string, long_string_object.long_string_sequence)
for i = 1:length(finder)
long_string_object.short_string_coverage[finder[i],:] += ones(Int64, length(finder[i]), 1) # add 1 to every position covered by ranges in finder
end
push!(long_string_object.mapped_short_strings, short_string)
append!(long_string_object.mapped_short_strings_coordinates, finder)
end
nothing
end
3.) loop through every short string-long string pairing
function map_array_of_short_strings_to_array_of_long_string_objects(short_string_array, long_string_object_array)
for c = 1:size(short_string_array,1)
for d = 1:sizelong_string_object_array,1)
map_short_string_to_long_string(short_string_array[c], long_string_object_array[d,1])
end
end
nothing
end
I also tried removing one for loop and replacing it with a f.(long_string_object_array) functionality (which is the same as using map
if I understand correctly, didnt work.
The question: is there a more efficient way of doing this? Im not happy with the performance, especially because
a) a friend managed to do it in half the time using python (by putting all short strings in to one string, seperated by a “|”, then using the function iterfind
in python 3)
b) everyone says I shouldnt be shy with for loops in Julia, but I was hoping for more…
Also, do you have general criticism for my code?