BioJulia Fastx import long{4}

I have a fast file that I loaded with FASTX and it seems that the data is stored as a LongDNA{4}. There are no ambiguities in the reads and I’d like to work with them as 2 bit sequences but I get an error when I try to convert these using BioSequences.

Is there someway to store them as 2bit using the FASTQ.Reader?

Hi may be you can try:

using BioSequences, FASTX        # import required Bioinformatics packages

# import .fastq file
seqfile = FASTX.FASTQ.Reader(open("test.fastq","r")) 

# get DNA sequences from file
for record in seqfile
    dna_seq = FASTX.FASTQ.sequence(record)
end

#create a function to convert LongSequence{DNAAlphabet{4}} to 2 bit using a dictionary

function convert_to_2bit(dna_seq::LongSequence{DNAAlphabet{4}})
    # Define a dictionary to map DNA bases to 2-bit codes
    bit_dict = Dict('A' => 0b00, 'C' => 0b01, 'G' => 0b10, 'T' => 0b11)
    
    # Convert the DNA sequence to a string and split into individual bases
    dna_str = string(dna_seq)
    bases = collect(dna_str)
    
    # Convert each base to a 2-bit code and concatenate into a bitstring
    bits = BitVector(undef, 2*length(bases))
    for (i, base) in enumerate(bases)
        code = bit_dict[base]
        bits[2*(i-1) + 1] = code >> 1
        bits[2*(i-1) + 2] = code & 0b01
    end
    
    return bits
end

#Calling the function to convert dna_seq and store it in two_bit_seq 
two_bit_seq = convert_to_2bit(dna_seq)

two_bit_seq