FASTX.FASTA.Reader, no method matching error

Lamma · November 29, 2021, 3:27pm

I am using the ´FASTX` package to stream over a large fasta file but am getting a no method matching error:

ERROR: LoadError: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:334 [inlined]
 [2] threading_run(func::Function)
   @ Base.Threads ./threadingconstructs.jl:38
 [3] macro expansion
   @ ./threadingconstructs.jl:97 [inlined]
 [4] top-level scope
   @ /home/projects/ku_00014/people/robmur/scripts/metagenome/resampling/resampling.jl:38

    nested task error: MethodError: no method matching length(::FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOStream}})
    Closest candidates are:
      length(::Union{Base.KeySet, Base.ValueIterator}) at /services/tools/julia/1.7.0-rc3/share/julia/base/abstractdict.jl:58
      length(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}) at /services/tools/julia/1.7.0-rc3/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:171
      length(::Union{DataStructures.OrderedRobinDict, DataStructures.RobinDict}) at ~/.julia/packages/DataStructures/nBjdy/src/ordered_robin_dict.jl:86
      ...
    Stacktrace:
     [1] (::var"#2#threadsfor_fun#11"{FASTX.FASTA.Writer{TranscodingStreams.NoopStream{IOStream}}, Vector{String15}, FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOStream}}})(onethread::Bool)
       @ Main ./threadingconstructs.jl:53
     [2] (::var"#2#threadsfor_fun#11"{FASTX.FASTA.Writer{TranscodingStreams.NoopStream{IOStream}}, Vector{String15}, FASTX.FASTA.Reader{TranscodingStreams.NoopStream{IOStream}}})()
       @ Main ./threadingconstructs.jl:52
in expression starting at /home/projects/ku_00014/people/robmur/scripts/metagenome/resampling/resampling.jl:14

This started to occur after after I switched from a gzipped to an ungizziped fasta but I am unsure f that is the cause of the error. I am running the package as follows:

reader = FASTA.Reader(open("some/file/path/megahit_final_assembly_500bp_filterd.fasta"))

    writer = open(FASTA.Writer, "some/file/path/500bp_filter_samples/"*sample)

    println("writing to file")

    @threads for record in reader

        if FASTA.identifier(record) in passContig

            write(writer, record)

        end

    end

    close(reader)

EDIT: I just tried with a gzip and I get the same error.

BioTurboNick · November 29, 2021, 6:50pm

Have you gotten @threads to work before? It looks like the issue is that the reader doesn’t have a length method defined for it, which the threading code uses to divide the work up.

BioTurboNick · November 29, 2021, 7:01pm

You might have to do some preprocessing to prepare for multithreaded processing of your file.

Another issue though… I don’t think reading and writing is thread-safe, so multithreading this will likely cause your output to be jumbled up.

BioTurboNick · November 29, 2021, 7:04pm

Ah, open has a lock argument you can use to enable safe multithreaded access.

Lamma · November 30, 2021, 8:31am

I will try the lockargument. I was hoping it would just run multiple iterations of the for loop on different threads but whatever speeds things up is good Looking more into it though, would multi-processing be more optimal for this then?

Have you gotten @threads to work before? It looks like the issue is that the reader doesn’t have a length method defined for it, which the threading code uses to divide the work up.

No I have not used it before. I am wanting the nthreads to equal the available number.

BioTurboNick · November 30, 2021, 1:14pm

Yeah, it would be nice if @threads could work with a collection of unknown length. I think you would have the same, or worse, issue with multiprocessing.

Also your example code isn’t doing that much processing it looks like, is that right? If so, you’ll be IO-bound and parallel processing may not be much help anyway.

BioTurboNick · November 30, 2021, 1:19pm

Or you could do something like load a reasonably big chunk, process it in threads, load another chunk, etc.

Lamma · November 30, 2021, 1:21pm

The object being iterated through is huge. It is a co-assembly from 22 metagenomic samples. So I assume that is where the long run time comes in. I am not sure of how I can speed that up natively though?

Topic		Replies	Views
Multithreaded mapping of an iterator (not a collection) New to Julia	11	2006	December 19, 2021
Indexing a fasta file with FASTX.jl Biology, Health, and Medicine question , package	1	1086	December 2, 2021
Performant reading of .tar.xz files Performance question , speed-optimization	15	712	September 18, 2023
Missing records in my FASTA file when writing new FASTAs Biology, Health, and Medicine question	3	530	January 5, 2022
Streaming gziped file to FASTQ.Reader - where to add method? General Usage question , biology , input-output	2	1038	March 20, 2020

FASTX.FASTA.Reader, no method matching error

Related topics