FASTX.FASTA.Reader, no method matching error

I am using the ´FASTX` package to stream over a large fasta file but am getting a no method matching error:

ERROR: LoadError: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:334 [inlined]
 [2] threading_run(func::Function)
   @ Base.Threads ./threadingconstructs.jl:38
 [3] macro expansion
   @ ./threadingconstructs.jl:97 [inlined]
 [4] top-level scope
   @ /home/projects/ku_00014/people/robmur/scripts/metagenome/resampling/resampling.jl:38

    nested task error: MethodError: no method matching length(::FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOStream}})
    Closest candidates are:
      length(::Union{Base.KeySet, Base.ValueIterator}) at /services/tools/julia/1.7.0-rc3/share/julia/base/abstractdict.jl:58
      length(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}) at /services/tools/julia/1.7.0-rc3/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:171
      length(::Union{DataStructures.OrderedRobinDict, DataStructures.RobinDict}) at ~/.julia/packages/DataStructures/nBjdy/src/ordered_robin_dict.jl:86
      ...
    Stacktrace:
     [1] (::var"#2#threadsfor_fun#11"{FASTX.FASTA.Writer{TranscodingStreams.NoopStream{IOStream}}, Vector{String15}, FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOStream}}})(onethread::Bool)
       @ Main ./threadingconstructs.jl:53
     [2] (::var"#2#threadsfor_fun#11"{FASTX.FASTA.Writer{TranscodingStreams.NoopStream{IOStream}}, Vector{String15}, FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOStream}}})()
       @ Main ./threadingconstructs.jl:52
in expression starting at /home/projects/ku_00014/people/robmur/scripts/metagenome/resampling/resampling.jl:14

This started to occur after after I switched from a gzipped to an ungizziped fasta but I am unsure f that is the cause of the error. I am running the package as follows:

reader = FASTA.Reader(open("some/file/path/megahit_final_assembly_500bp_filterd.fasta"))

    writer = open(FASTA.Writer, "some/file/path/500bp_filter_samples/"*sample)

    println("writing to file")

    @threads for record in reader

        if FASTA.identifier(record) in passContig

            write(writer, record)

        end

    end

    close(reader)

EDIT: I just tried with a gzip and I get the same error.

Have you gotten @threads to work before? It looks like the issue is that the reader doesn’t have a length method defined for it, which the threading code uses to divide the work up.

You might have to do some preprocessing to prepare for multithreaded processing of your file.

Another issue though… I don’t think reading and writing is thread-safe, so multithreading this will likely cause your output to be jumbled up.

Ah, open has a lock argument you can use to enable safe multithreaded access.

I will try the lockargument. I was hoping it would just run multiple iterations of the for loop on different threads but whatever speeds things up is good :slight_smile: Looking more into it though, would multi-processing be more optimal for this then?

Have you gotten @threads to work before? It looks like the issue is that the reader doesn’t have a length method defined for it, which the threading code uses to divide the work up.

No I have not used it before. I am wanting the nthreads to equal the available number.

Yeah, it would be nice if @threads could work with a collection of unknown length. I think you would have the same, or worse, issue with multiprocessing.

Also your example code isn’t doing that much processing it looks like, is that right? If so, you’ll be IO-bound and parallel processing may not be much help anyway.

Or you could do something like load a reasonably big chunk, process it in threads, load another chunk, etc.

The object being iterated through is huge. It is a co-assembly from 22 metagenomic samples. So I assume that is where the long run time comes in. I am not sure of how I can speed that up natively though?

1 Like