I am wanting to run a simple for loop
that is writing certain lines from a very large file to a new file based on a if
check of pre computed criteria.
reader = FASTA.Reader(GzipDecompressorStream(open("/home/people/robmur/ku_00014/data/termite_metagenome/pre-post_fungus/sequences/final_assemblies/megahit_final_assembly_500bp_filterd.fasta.gz")))
writer = open(FASTA.Writer, "/home/people/robmur/ku_00014/data/termite_metagenome/pre-post_fungus/sequences/final_assemblies/500bp_filter_samples/"*sample)
println("writing to file")
@threads for record in reader
if FASTA.identifier(record) in passContig
write(writer, record)
end
end
close(reader)
I settled on using @threads
however i am new the julia so I am entirely unsure if Distributed
would be more optimal here.
My understanding of the difference is that distributed
computing would run multiple processes on different cores whilst @threads
will split the task up into subtasks and run on the same (or different cores)?
Does this then mean Distributed
will run multiple iterations of the for loop at once?