I am wanting to run a simple for loop that is writing certain lines from a very large file to a new file based on a if check of pre computed criteria.
reader = FASTA.Reader(GzipDecompressorStream(open("/home/people/robmur/ku_00014/data/termite_metagenome/pre-post_fungus/sequences/final_assemblies/megahit_final_assembly_500bp_filterd.fasta.gz")))
writer = open(FASTA.Writer, "/home/people/robmur/ku_00014/data/termite_metagenome/pre-post_fungus/sequences/final_assemblies/500bp_filter_samples/"*sample)
println("writing to file")
@threads for record in reader
if FASTA.identifier(record) in passContig
write(writer, record)
end
end
close(reader)
I settled on using @threads however i am new the julia so I am entirely unsure if Distributed would be more optimal here.
My understanding of the difference is that distributed computing would run multiple processes on different cores whilst @threads will split the task up into subtasks and run on the same (or different cores)?
Does this then mean Distributed will run multiple iterations of the for loop at once?