Hi,
are there any examples of renaming IDs in a GFF3 file?
Thank you in advance,
Hi,
are there any examples of renaming IDs in a GFF3 file?
Thank you in advance,
If you can convert it to GenBank format you can do it easily with GenomicAnnotations.jl. Adding GFF support is on my todo-list, but as I have no use for it myself it has low priority.
Woah, when did a gbk parser get added? Iāve been meaning to do this for like 3 years . Is it written in julia or wrapping something else?
Itās pure Julia. I think I wrote the first version almost three years ago now! Itās only been public for about a year I think, though. I guess I havenāt advertised* it very well. The next version will be in BioJuliaRegistry instead of General, which should make it easier to find.
(*speaking of which, the whole reason I started GenomicAnnotations.jl was to make GenomicMaps.jl)
I wrote a quick parser for GFF files (which, surprise surprise, was much easier than for GenBank files), and it seems to work just fine. Unless I run into any problems Iāll add a way to write to GFF and upload it.
Neat! I see you didnāt use the FSM approach that other BioJulia parsers use. That probably makes sense, (thatās what I tinkered with and it was rough), but have you benchmarked it against some other language parsers?
Thereās a guy on a forum Iām part of that always brings up the need for a genbank parser before heāll even consider trying a language out - he suggested testing against the suite that BioPython uses. Let me know if youāre interested in a PR to add those tests, Iād love to be able to go to him with this in hand
I havenāt compared it to other parsers. I try to optimise things when I notice that something is performing poorly, but I only work with bacterial genomes so my needs arenāt that great.
I tried rewriting the parser using Automa.jl at some point, but once it got too complex it just wouldnāt compile anymore so I gave up on that idea.
Absolutely, any form of contribution is welcome!
I have now added a GFF parser. It worked for the file I tested it on, at least. Currently itās on the branch āparsegffā, so you can install it with:
(v1.3) pkg> add GenomicAnnotations#parsegff
Use readgff(filepath)
to read the file, and printgff(filepath, annotations)
to write them to a file after modifying the IDs. The documentation for GenomicAnnotations explains how to modify the data.
Thank you for such a quick implementation. Unfortunately, I ran into this problem:
julia> using GenomicAnnotations
julia> chr=readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
ERROR: BoundsError: attempt to access 1-element Array{SubString{String},1} at index [2]
Stacktrace:
[1] indexed_iterate at ./array.jl:744 [inlined]
[2] parsechromosome_gff(::Array{String,1}, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:261
[3] #readgff#33(::Bool, ::typeof(readgff), ::IOStream, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:316
[4] readgff at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:300 [inlined]
[5] #30 at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:295 [inlined]
[6] #open#271(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(open), ::GenomicAnnotations.var"#30#32"{DataType}, ::String) at ./io.jl:298
[7] open(::Function, ::String) at ./io.jl:296
[8] #readgff#28(::Bool, ::typeof(readgff), ::String, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:295
[9] readgff at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:291 [inlined] (repeats 2 times)
[10] top-level scope at none:0
I used Braker2ās GFF3 file:
NbV1Ch08 AUGUSTUS gene 7015 29794 0.01 - . ID=g1;
NbV1Ch08 AUGUSTUS mRNA 7015 29794 0.01 - . ID=g1.t1;Parent=g1
NbV1Ch08 AUGUSTUS transcription_end_site 7015 7015 . - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS three_prime_utr 7015 8531 0.2 - . ID=g1.t1.3UTR1;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 7015 8747 . - . ID=g1.t1.exon1;Parent=g1.t1;
NbV1Ch08 AUGUSTUS stop_codon 8532 8534 . - 0 Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 8532 8747 0.31 - 0 ID=g1.t1.CDS1;Parent=g1.t1
NbV1Ch08 AUGUSTUS intron 8748 9191 0.49 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 9192 9342 0.66 - 1 ID=g1.t1.CDS2;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 9192 9342 . - . ID=g1.t1.exon2;Parent=g1.t1;
bash-3.2$ head -n 100 /Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3
NbV1Ch08 AUGUSTUS gene 7015 29794 0.01 - . ID=g1;
NbV1Ch08 AUGUSTUS mRNA 7015 29794 0.01 - . ID=g1.t1;Parent=g1
NbV1Ch08 AUGUSTUS transcription_end_site 7015 7015 . - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS three_prime_utr 7015 8531 0.2 - . ID=g1.t1.3UTR1;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 7015 8747 . - . ID=g1.t1.exon1;Parent=g1.t1;
NbV1Ch08 AUGUSTUS stop_codon 8532 8534 . - 0 Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 8532 8747 0.31 - 0 ID=g1.t1.CDS1;Parent=g1.t1
NbV1Ch08 AUGUSTUS intron 8748 9191 0.49 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 9192 9342 0.66 - 1 ID=g1.t1.CDS2;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 9192 9342 . - . ID=g1.t1.exon2;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 9343 9915 0.58 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 9916 10006 0.71 - 2 ID=g1.t1.CDS3;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 9916 10006 . - . ID=g1.t1.exon3;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 10007 10101 0.74 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 10102 10201 0.78 - 0 ID=g1.t1.CDS4;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 10102 10201 . - . ID=g1.t1.exon4;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 10202 10712 0.8 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 10713 11107 0.11 - 2 ID=g1.t1.CDS5;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 10713 11107 . - . ID=g1.t1.exon5;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 11108 11569 0.07 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 11570 12151 0.09 - 2 ID=g1.t1.CDS6;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 11570 12151 . - . ID=g1.t1.exon6;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 12152 12588 0.34 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 12589 12717 0.39 - 2 ID=g1.t1.CDS7;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 12589 12717 . - . ID=g1.t1.exon7;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 12718 12789 0.42 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 12790 13075 0.39 - 0 ID=g1.t1.CDS8;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 12790 13075 . - . ID=g1.t1.exon8;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 13076 14832 0.51 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 14833 15009 0.39 - 0 ID=g1.t1.CDS9;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 14833 15009 . - . ID=g1.t1.exon9;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 15010 15278 0.59 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 15279 15415 0.56 - 2 ID=g1.t1.CDS10;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 15279 15415 . - . ID=g1.t1.exon10;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 15416 15487 0.58 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 15488 15612 0.96 - 1 ID=g1.t1.CDS11;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 15488 15612 . - . ID=g1.t1.exon11;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 15613 15706 0.96 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 15707 15957 0.98 - 0 ID=g1.t1.CDS12;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 15707 15958 . - . ID=g1.t1.exon12;Parent=g1.t1;
NbV1Ch08 AUGUSTUS start_codon 15955 15957 . - 0 Parent=g1.t1;
NbV1Ch08 AUGUSTUS five_prime_utr 15958 15958 0.99 - . ID=g1.t1.5UTR1;Parent=g1.t1
NbV1Ch08 AUGUSTUS five_prime_utr 27458 28250 0.37 - . ID=g1.t1.5UTR2;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 27458 28250 . - . ID=g1.t1.exon13;Parent=g1.t1;
NbV1Ch08 AUGUSTUS five_prime_utr 29272 29794 0.08 - . ID=g1.t1.5UTR3;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 29272 29794 . - . ID=g1.t1.exon14;Parent=g1.t1;
NbV1Ch08 AUGUSTUS transcription_start_site 29794 29794 . - . Parent=g1.t1;
What did I miss?
Thank you in advance,
Thank you for such a quick implementation. Unfortunately, I ran into this problem:
julia> using GenomicAnnotations
julia> chr=readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
ERROR: BoundsError: attempt to access 1-element Array{SubString{String},1} at index [2]
Stacktrace:
[1] indexed_iterate at ./array.jl:744 [inlined]
[2] parsechromosome_gff(::Array{String,1}, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:261
[3] #readgff#33(::Bool, ::typeof(readgff), ::IOStream, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:316
[4] readgff at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:300 [inlined]
[5] #30 at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:295 [inlined]
[6] #open#271(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(open), ::GenomicAnnotations.var"#30#32"{DataType}, ::String) at ./io.jl:298
[7] open(::Function, ::String) at ./io.jl:296
[8] #readgff#28(::Bool, ::typeof(readgff), ::String, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:295
[9] readgff at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:291 [inlined] (repeats 2 times)
[10] top-level scope at none:0
I used Braker2ās GFF3 file:
NbV1Ch08 AUGUSTUS gene 7015 29794 0.01 - . ID=g1;
NbV1Ch08 AUGUSTUS mRNA 7015 29794 0.01 - . ID=g1.t1;Parent=g1
NbV1Ch08 AUGUSTUS transcription_end_site 7015 7015 . - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS three_prime_utr 7015 8531 0.2 - . ID=g1.t1.3UTR1;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 7015 8747 . - . ID=g1.t1.exon1;Parent=g1.t1;
NbV1Ch08 AUGUSTUS stop_codon 8532 8534 . - 0 Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 8532 8747 0.31 - 0 ID=g1.t1.CDS1;Parent=g1.t1
NbV1Ch08 AUGUSTUS intron 8748 9191 0.49 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 9192 9342 0.66 - 1 ID=g1.t1.CDS2;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 9192 9342 . - . ID=g1.t1.exon2;Parent=g1.t1;
bash-3.2$ head -n 100 /Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3
NbV1Ch08 AUGUSTUS gene 7015 29794 0.01 - . ID=g1;
NbV1Ch08 AUGUSTUS mRNA 7015 29794 0.01 - . ID=g1.t1;Parent=g1
NbV1Ch08 AUGUSTUS transcription_end_site 7015 7015 . - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS three_prime_utr 7015 8531 0.2 - . ID=g1.t1.3UTR1;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 7015 8747 . - . ID=g1.t1.exon1;Parent=g1.t1;
NbV1Ch08 AUGUSTUS stop_codon 8532 8534 . - 0 Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 8532 8747 0.31 - 0 ID=g1.t1.CDS1;Parent=g1.t1
NbV1Ch08 AUGUSTUS intron 8748 9191 0.49 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 9192 9342 0.66 - 1 ID=g1.t1.CDS2;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 9192 9342 . - . ID=g1.t1.exon2;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 9343 9915 0.58 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 9916 10006 0.71 - 2 ID=g1.t1.CDS3;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 9916 10006 . - . ID=g1.t1.exon3;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 10007 10101 0.74 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 10102 10201 0.78 - 0 ID=g1.t1.CDS4;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 10102 10201 . - . ID=g1.t1.exon4;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 10202 10712 0.8 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 10713 11107 0.11 - 2 ID=g1.t1.CDS5;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 10713 11107 . - . ID=g1.t1.exon5;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 11108 11569 0.07 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 11570 12151 0.09 - 2 ID=g1.t1.CDS6;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 11570 12151 . - . ID=g1.t1.exon6;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 12152 12588 0.34 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 12589 12717 0.39 - 2 ID=g1.t1.CDS7;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 12589 12717 . - . ID=g1.t1.exon7;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 12718 12789 0.42 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 12790 13075 0.39 - 0 ID=g1.t1.CDS8;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 12790 13075 . - . ID=g1.t1.exon8;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 13076 14832 0.51 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 14833 15009 0.39 - 0 ID=g1.t1.CDS9;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 14833 15009 . - . ID=g1.t1.exon9;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 15010 15278 0.59 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 15279 15415 0.56 - 2 ID=g1.t1.CDS10;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 15279 15415 . - . ID=g1.t1.exon10;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 15416 15487 0.58 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 15488 15612 0.96 - 1 ID=g1.t1.CDS11;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 15488 15612 . - . ID=g1.t1.exon11;Parent=g1.t1;
NbV1Ch08 AUGUSTUS intron 15613 15706 0.96 - . Parent=g1.t1;
NbV1Ch08 AUGUSTUS CDS 15707 15957 0.98 - 0 ID=g1.t1.CDS12;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 15707 15958 . - . ID=g1.t1.exon12;Parent=g1.t1;
NbV1Ch08 AUGUSTUS start_codon 15955 15957 . - 0 Parent=g1.t1;
NbV1Ch08 AUGUSTUS five_prime_utr 15958 15958 0.99 - . ID=g1.t1.5UTR1;Parent=g1.t1
NbV1Ch08 AUGUSTUS five_prime_utr 27458 28250 0.37 - . ID=g1.t1.5UTR2;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 27458 28250 . - . ID=g1.t1.exon13;Parent=g1.t1;
NbV1Ch08 AUGUSTUS five_prime_utr 29272 29794 0.08 - . ID=g1.t1.5UTR3;Parent=g1.t1
NbV1Ch08 AUGUSTUS exon 29272 29794 . - . ID=g1.t1.exon14;Parent=g1.t1;
NbV1Ch08 AUGUSTUS transcription_start_site 29794 29794 . - . Parent=g1.t1;
What did I miss?
Thank you in advance,
I donāt know whether itās against the GFF3 specifications or not, but the problem stems from the trailing semicolons. Either way I added a fix. After updating, you should be able to change the IDs with something like:
using GenomicAnnotations
chrs = readgff(filepath)
for (i, gene) in enumerate(@genes(chrs, !ismissing(:ID)))
gene.ID = "newid_$(string(i, pad=4))"
end
printgff(newfilepath, chrs)
Itās not important for the parser, but the file is supposed to have a header specifying the GFF version, so it is not following the specifications.
Thank you, but while updating I got an error:
(v1.3) pkg> update GenomicAnnotations#parsegff
ERROR: invalid token
What did I miss?
Thank you in advance,
I think just
(v1.3) pkg> up
should work. If not, try:
(v1.3) pkg> rm GenomicAnnotations
(v1.3) pkg> add GenomicAnnotations#parsegff
Thank up
worked but now I got a new error:
julia> using GenomicAnnotations
[ Info: Precompiling GenomicAnnotations [4f8a0a0a-376d-5ac0-ab14-e88793df67f0]julia> chr=readgff(ā/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3ā)
1-element Array{Chromosome{Gene},1}:
Chromosome āNbV1Ch08ā (0 bp) with 136090 annotationsjulia> print(chr)
Chromosome{Gene}[Chromosome āNbV1Ch08ā (0 bp) with 136090 annotations
]
julia> for gene in chr.genes
print(gene)
end
ERROR: type Array has no field genes
Stacktrace:
[1] getproperty(::Array{Chromosome{Gene},1}, ::Symbol) at ./Base.jl:20
[2] top-level scope at ./REPL[13]:1
What did I miss?
GBK/GFF files can contain multiple chromosomes, so readgff
returns an array. There are multiple ways you can deal with this, so pick the one that suits you best. You can:
@genes(chrs)
to iterate over the genes from all chromosomes:for gene in @genes(chrs)
...
end
for chr in chrs
for gene in chr.genes
...
end
end
chr
:chr = readgff(filepath)[1]
for gene in chr.genes
....
end
I recommend option 1, and if your file only contains one chromosome option 3 (@genes
works on individual Chromosome
s or arrays of Chromosome
s, so you can combine the two).
Thank you. I used the following code but I get unexpected output:
NbV1Ch08_missing
NbV1Ch08_missing
NbV1Ch08_missing
NbV1Ch08_missing
NbV1Ch08_missing
NbV1Ch08_missing
with the below code:
using GenomicAnnotations
chrs=readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
for chr in chrs
for gene in chr.genes
println("$(chr.name)_$(gene.id)")
end
end
How can I access the geneās ids?
Thank you in advance,
Attribute names are case-sensitive, so use gene.ID
, not gene.id
. Judging by the partial file you posted, not all entries have an ID, so I recommend using the version with @genes
that I posted earlier:
for gene in @genes(chr, !ismissing(:ID))
println("$(chr.name)_$(gene.ID)")
end
This will iterate over the entries that do have and ID. Otherwise, for entries that lack an ID gene.ID
will, again, return missing
.
Thank you but I ran into a new error:
julia> chr=readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
1-element Array{Chromosome{Gene},1}:
Chromosome 'NbV1Ch08' (0 bp) with 136090 annotations
julia> for gene in @genes(chr, !ismissing(:ID))
println("$(chr.name)_$(gene.ID)")
end
ERROR: type Array has no field name
Stacktrace:
[1] getproperty(::Array{Chromosome{Gene},1}, ::Symbol) at ./Base.jl:20
[2] top-level scope at /Users/lorencm/.julia/packages/GenomicAnnotations/4kJOh/src/macro.jl:2
Please find here a GFF3 for one chromosome.
Thank you in advance
Your chr
is still an Array{Chromosome}
. In this case there is only one chromosome, so you can access the name with chr[1].name
. For a more general solution, use parent(gene).name
. parent(gene::Gene)
returns the Chromosome
that contains gene
.
The following example will work for a GFF file with any number of chromosomes:
using GenomicAnnotations
chrs = readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
for gene in @genes(chrs, !ismissing(:ID))
println("$(parent(gene).name)_$(gene.ID)")
end
Thank you. I added a counter which lead that each feature gets a new ID rather each gene.
NbV1Ch08_g5742.t1_078390
NbV1Ch08_g5742.t1.3UTR1_078391
NbV1Ch08_g5742.t1.exon1_078392
NbV1Ch08_g5742.t1.CDS1_078393
NbV1Ch08_g5742.t1.CDS2_078394
NbV1Ch08_g5742.t1.exon2_078395
NbV1Ch08_g5742.t1.CDS3_078396
NbV1Ch08_g5742.t1.exon3_078397
NbV1Ch08_g5742.t1.CDS4_078398
NbV1Ch08_g5742.t1.exon4_078399
NbV1Ch08_g5742.t1.CDS5_078400
NbV1Ch08_g5742.t1.exon5_078401
NbV1Ch08_g5742.t1.CDS6_078402
NbV1Ch08_g5742.t1.exon6_078403
NbV1Ch08_g5742.t1.CDS7_078404
NbV1Ch08_g5742.t1.exon7_078405
NbV1Ch08_g5742.t1.CDS8_078406
NbV1Ch08_g5742.t1.exon8_078407
NbV1Ch08_g5742.t1.CDS9_078408
NbV1Ch08_g5742.t1.exon9_078409
NbV1Ch08_g5742.t1.CDS10_078410
NbV1Ch08_g5742.t1.exon10_078411
NbV1Ch08_g5742.t1.CDS11_078412
NbV1Ch08_g5742.t1.exon11_078413
NbV1Ch08_g5742.t1.CDS12_078414
NbV1Ch08_g5742.t1.exon12_078415
NbV1Ch08_g5742.t1.CDS13_078416
NbV1Ch08_g5742.t1.exon13_078417
NbV1Ch08_g5742.t1.CDS14_078418
NbV1Ch08_g5742.t1.exon14_078419
NbV1Ch08_g5742.t1.CDS15_078420
NbV1Ch08_g5742.t1.exon15_078421
NbV1Ch08_g5742.t1.5UTR1_078422
NbV1Ch08_g5742.t2_078423
NbV1Ch08_g5742.t2.3UTR1_078424
NbV1Ch08_g5742.t2.exon1_078425
NbV1Ch08_g5742.t2.CDS1_078426
NbV1Ch08_g5742.t2.CDS2_078427
NbV1Ch08_g5742.t2.exon2_078428
NbV1Ch08_g5742.t2.CDS3_078429
NbV1Ch08_g5742.t2.exon3_078430
NbV1Ch08_g5742.t2.CDS4_078431
NbV1Ch08_g5742.t2.exon4_078432
NbV1Ch08_g5742.t2.CDS5_078433
NbV1Ch08_g5742.t2.exon5_078434
NbV1Ch08_g5742.t2.CDS6_078435
NbV1Ch08_g5742.t2.exon6_078436
NbV1Ch08_g5742.t2.5UTR1_078437
I would have expected to see:
NbV1Ch08_g5742.t1_078390
NbV1Ch08_g5742.t1.3UTR1_078390
NbV1Ch08_g5742.t1.exon1_078390
NbV1Ch08_g5742.t1.CDS1_078390
...
Here is the update code:
using GenomicAnnotations
chrs = readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
for (count, gene) in enumerate(@genes(chrs, !ismissing(:ID)))
newID = lpad(count, 6, '0')
println("$(parent(gene).name)_$(gene.ID)_$(newID)")
end
Is there a better way to implement a counter?
Thank you in advance,
Something like this?
using GenomicAnnotations
chrs = readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
currentID = 0
for gene in @genes(chrs, !ismissing(:ID))
if feature(gene) == :gene
global currentID += 1
end
newID = lpad(currentID, 6, '0')
println("$(parent(gene).name)_$(gene.ID)_$newID")
end