Renaming IDs in a GFF3

Hi,
are there any examples of renaming IDs in a GFF3 file?

Thank you in advance,

If you can convert it to GenBank format you can do it easily with GenomicAnnotations.jl. Adding GFF support is on my todo-list, but as I have no use for it myself it has low priority.

1 Like

Woah, when did a gbk parser get added? Iā€™ve been meaning to do this for like 3 years :laughing:. Is it written in julia or wrapping something else?

Itā€™s pure Julia. I think I wrote the first version almost three years ago now! Itā€™s only been public for about a year I think, though. I guess I havenā€™t advertised* it very well. The next version will be in BioJuliaRegistry instead of General, which should make it easier to find.

(*speaking of which, the whole reason I started GenomicAnnotations.jl was to make GenomicMaps.jl)

I wrote a quick parser for GFF files (which, surprise surprise, was much easier than for GenBank files), and it seems to work just fine. Unless I run into any problems Iā€™ll add a way to write to GFF and upload it.

1 Like

Neat! I see you didnā€™t use the FSM approach that other BioJulia parsers use. That probably makes sense, (thatā€™s what I tinkered with and it was rough), but have you benchmarked it against some other language parsers?

Thereā€™s a guy on a forum Iā€™m part of that always brings up the need for a genbank parser before heā€™ll even consider trying a language out - he suggested testing against the suite that BioPython uses. Let me know if youā€™re interested in a PR to add those tests, Iā€™d love to be able to go to him with this in hand :slight_smile:

I havenā€™t compared it to other parsers. I try to optimise things when I notice that something is performing poorly, but I only work with bacterial genomes so my needs arenā€™t that great.

I tried rewriting the parser using Automa.jl at some point, but once it got too complex it just wouldnā€™t compile anymore so I gave up on that idea.

Absolutely, any form of contribution is welcome!

1 Like

I have now added a GFF parser. It worked for the file I tested it on, at least. Currently itā€™s on the branch ā€œparsegffā€, so you can install it with:

(v1.3) pkg> add GenomicAnnotations#parsegff

Use readgff(filepath) to read the file, and printgff(filepath, annotations) to write them to a file after modifying the IDs. The documentation for GenomicAnnotations explains how to modify the data.

1 Like

Thank you for such a quick implementation. Unfortunately, I ran into this problem:

julia> using GenomicAnnotations

julia> chr=readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
ERROR: BoundsError: attempt to access 1-element Array{SubString{String},1} at index [2]
Stacktrace:
 [1] indexed_iterate at ./array.jl:744 [inlined]
 [2] parsechromosome_gff(::Array{String,1}, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:261
 [3] #readgff#33(::Bool, ::typeof(readgff), ::IOStream, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:316
 [4] readgff at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:300 [inlined]
 [5] #30 at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:295 [inlined]
 [6] #open#271(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(open), ::GenomicAnnotations.var"#30#32"{DataType}, ::String) at ./io.jl:298
 [7] open(::Function, ::String) at ./io.jl:296
 [8] #readgff#28(::Bool, ::typeof(readgff), ::String, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:295
 [9] readgff at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:291 [inlined] (repeats 2 times)
 [10] top-level scope at none:0

I used Braker2ā€™s GFF3 file:

NbV1Ch08    AUGUSTUS    gene    7015    29794   0.01    -   .   ID=g1;
NbV1Ch08    AUGUSTUS    mRNA    7015    29794   0.01    -   .   ID=g1.t1;Parent=g1
NbV1Ch08    AUGUSTUS    transcription_end_site  7015    7015    .   -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    three_prime_utr 7015    8531    0.2 -   .   ID=g1.t1.3UTR1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    7015    8747    .   -   .   ID=g1.t1.exon1;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    stop_codon  8532    8534    .   -   0   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 8532    8747    0.31    -   0   ID=g1.t1.CDS1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    intron  8748    9191    0.49    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 9192    9342    0.66    -   1   ID=g1.t1.CDS2;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    9192    9342    .   -   .   ID=g1.t1.exon2;Parent=g1.t1;
bash-3.2$ head -n 100 /Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3
NbV1Ch08    AUGUSTUS    gene    7015    29794   0.01    -   .   ID=g1;
NbV1Ch08    AUGUSTUS    mRNA    7015    29794   0.01    -   .   ID=g1.t1;Parent=g1
NbV1Ch08    AUGUSTUS    transcription_end_site  7015    7015    .   -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    three_prime_utr 7015    8531    0.2 -   .   ID=g1.t1.3UTR1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    7015    8747    .   -   .   ID=g1.t1.exon1;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    stop_codon  8532    8534    .   -   0   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 8532    8747    0.31    -   0   ID=g1.t1.CDS1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    intron  8748    9191    0.49    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 9192    9342    0.66    -   1   ID=g1.t1.CDS2;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    9192    9342    .   -   .   ID=g1.t1.exon2;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  9343    9915    0.58    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 9916    10006   0.71    -   2   ID=g1.t1.CDS3;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    9916    10006   .   -   .   ID=g1.t1.exon3;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  10007   10101   0.74    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 10102   10201   0.78    -   0   ID=g1.t1.CDS4;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    10102   10201   .   -   .   ID=g1.t1.exon4;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  10202   10712   0.8 -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 10713   11107   0.11    -   2   ID=g1.t1.CDS5;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    10713   11107   .   -   .   ID=g1.t1.exon5;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  11108   11569   0.07    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 11570   12151   0.09    -   2   ID=g1.t1.CDS6;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    11570   12151   .   -   .   ID=g1.t1.exon6;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  12152   12588   0.34    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 12589   12717   0.39    -   2   ID=g1.t1.CDS7;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    12589   12717   .   -   .   ID=g1.t1.exon7;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  12718   12789   0.42    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 12790   13075   0.39    -   0   ID=g1.t1.CDS8;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    12790   13075   .   -   .   ID=g1.t1.exon8;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  13076   14832   0.51    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 14833   15009   0.39    -   0   ID=g1.t1.CDS9;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    14833   15009   .   -   .   ID=g1.t1.exon9;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  15010   15278   0.59    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 15279   15415   0.56    -   2   ID=g1.t1.CDS10;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    15279   15415   .   -   .   ID=g1.t1.exon10;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  15416   15487   0.58    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 15488   15612   0.96    -   1   ID=g1.t1.CDS11;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    15488   15612   .   -   .   ID=g1.t1.exon11;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  15613   15706   0.96    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 15707   15957   0.98    -   0   ID=g1.t1.CDS12;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    15707   15958   .   -   .   ID=g1.t1.exon12;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    start_codon 15955   15957   .   -   0   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    five_prime_utr  15958   15958   0.99    -   .   ID=g1.t1.5UTR1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    five_prime_utr  27458   28250   0.37    -   .   ID=g1.t1.5UTR2;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    27458   28250   .   -   .   ID=g1.t1.exon13;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    five_prime_utr  29272   29794   0.08    -   .   ID=g1.t1.5UTR3;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    29272   29794   .   -   .   ID=g1.t1.exon14;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    transcription_start_site    29794   29794   .   -   .   Parent=g1.t1;

What did I miss?

Thank you in advance,

Thank you for such a quick implementation. Unfortunately, I ran into this problem:

julia> using GenomicAnnotations

julia> chr=readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
ERROR: BoundsError: attempt to access 1-element Array{SubString{String},1} at index [2]
Stacktrace:
 [1] indexed_iterate at ./array.jl:744 [inlined]
 [2] parsechromosome_gff(::Array{String,1}, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:261
 [3] #readgff#33(::Bool, ::typeof(readgff), ::IOStream, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:316
 [4] readgff at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:300 [inlined]
 [5] #30 at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:295 [inlined]
 [6] #open#271(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(open), ::GenomicAnnotations.var"#30#32"{DataType}, ::String) at ./io.jl:298
 [7] open(::Function, ::String) at ./io.jl:296
 [8] #readgff#28(::Bool, ::typeof(readgff), ::String, ::Type) at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:295
 [9] readgff at /Users/lorencm/.julia/packages/GenomicAnnotations/Y7qTk/src/readgbk.jl:291 [inlined] (repeats 2 times)
 [10] top-level scope at none:0

I used Braker2ā€™s GFF3 file:

NbV1Ch08    AUGUSTUS    gene    7015    29794   0.01    -   .   ID=g1;
NbV1Ch08    AUGUSTUS    mRNA    7015    29794   0.01    -   .   ID=g1.t1;Parent=g1
NbV1Ch08    AUGUSTUS    transcription_end_site  7015    7015    .   -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    three_prime_utr 7015    8531    0.2 -   .   ID=g1.t1.3UTR1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    7015    8747    .   -   .   ID=g1.t1.exon1;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    stop_codon  8532    8534    .   -   0   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 8532    8747    0.31    -   0   ID=g1.t1.CDS1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    intron  8748    9191    0.49    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 9192    9342    0.66    -   1   ID=g1.t1.CDS2;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    9192    9342    .   -   .   ID=g1.t1.exon2;Parent=g1.t1;
bash-3.2$ head -n 100 /Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3
NbV1Ch08    AUGUSTUS    gene    7015    29794   0.01    -   .   ID=g1;
NbV1Ch08    AUGUSTUS    mRNA    7015    29794   0.01    -   .   ID=g1.t1;Parent=g1
NbV1Ch08    AUGUSTUS    transcription_end_site  7015    7015    .   -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    three_prime_utr 7015    8531    0.2 -   .   ID=g1.t1.3UTR1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    7015    8747    .   -   .   ID=g1.t1.exon1;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    stop_codon  8532    8534    .   -   0   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 8532    8747    0.31    -   0   ID=g1.t1.CDS1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    intron  8748    9191    0.49    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 9192    9342    0.66    -   1   ID=g1.t1.CDS2;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    9192    9342    .   -   .   ID=g1.t1.exon2;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  9343    9915    0.58    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 9916    10006   0.71    -   2   ID=g1.t1.CDS3;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    9916    10006   .   -   .   ID=g1.t1.exon3;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  10007   10101   0.74    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 10102   10201   0.78    -   0   ID=g1.t1.CDS4;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    10102   10201   .   -   .   ID=g1.t1.exon4;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  10202   10712   0.8 -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 10713   11107   0.11    -   2   ID=g1.t1.CDS5;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    10713   11107   .   -   .   ID=g1.t1.exon5;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  11108   11569   0.07    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 11570   12151   0.09    -   2   ID=g1.t1.CDS6;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    11570   12151   .   -   .   ID=g1.t1.exon6;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  12152   12588   0.34    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 12589   12717   0.39    -   2   ID=g1.t1.CDS7;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    12589   12717   .   -   .   ID=g1.t1.exon7;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  12718   12789   0.42    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 12790   13075   0.39    -   0   ID=g1.t1.CDS8;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    12790   13075   .   -   .   ID=g1.t1.exon8;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  13076   14832   0.51    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 14833   15009   0.39    -   0   ID=g1.t1.CDS9;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    14833   15009   .   -   .   ID=g1.t1.exon9;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  15010   15278   0.59    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 15279   15415   0.56    -   2   ID=g1.t1.CDS10;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    15279   15415   .   -   .   ID=g1.t1.exon10;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  15416   15487   0.58    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 15488   15612   0.96    -   1   ID=g1.t1.CDS11;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    15488   15612   .   -   .   ID=g1.t1.exon11;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    intron  15613   15706   0.96    -   .   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    CDS 15707   15957   0.98    -   0   ID=g1.t1.CDS12;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    15707   15958   .   -   .   ID=g1.t1.exon12;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    start_codon 15955   15957   .   -   0   Parent=g1.t1;
NbV1Ch08    AUGUSTUS    five_prime_utr  15958   15958   0.99    -   .   ID=g1.t1.5UTR1;Parent=g1.t1
NbV1Ch08    AUGUSTUS    five_prime_utr  27458   28250   0.37    -   .   ID=g1.t1.5UTR2;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    27458   28250   .   -   .   ID=g1.t1.exon13;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    five_prime_utr  29272   29794   0.08    -   .   ID=g1.t1.5UTR3;Parent=g1.t1
NbV1Ch08    AUGUSTUS    exon    29272   29794   .   -   .   ID=g1.t1.exon14;Parent=g1.t1;
NbV1Ch08    AUGUSTUS    transcription_start_site    29794   29794   .   -   .   Parent=g1.t1;

What did I miss?

Thank you in advance,

I donā€™t know whether itā€™s against the GFF3 specifications or not, but the problem stems from the trailing semicolons. Either way I added a fix. After updating, you should be able to change the IDs with something like:

using GenomicAnnotations
chrs = readgff(filepath)
for (i, gene) in enumerate(@genes(chrs, !ismissing(:ID)))
    gene.ID = "newid_$(string(i, pad=4))"
end
printgff(newfilepath, chrs)

Itā€™s not important for the parser, but the file is supposed to have a header specifying the GFF version, so it is not following the specifications.

1 Like

Thank you, but while updating I got an error:

(v1.3) pkg> update GenomicAnnotations#parsegff
ERROR: invalid token

What did I miss?
Thank you in advance,

I think just

(v1.3) pkg> up

should work. If not, try:

(v1.3) pkg> rm GenomicAnnotations
(v1.3) pkg> add GenomicAnnotations#parsegff
1 Like

Thank up worked but now I got a new error:

julia> using GenomicAnnotations
[ Info: Precompiling GenomicAnnotations [4f8a0a0a-376d-5ac0-ab14-e88793df67f0]

julia> chr=readgff(ā€œ/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3ā€)
1-element Array{Chromosome{Gene},1}:
Chromosome ā€˜NbV1Ch08ā€™ (0 bp) with 136090 annotations

julia> print(chr)
Chromosome{Gene}[Chromosome ā€˜NbV1Ch08ā€™ (0 bp) with 136090 annotations
]
julia> for gene in chr.genes
print(gene)
end
ERROR: type Array has no field genes
Stacktrace:
[1] getproperty(::Array{Chromosome{Gene},1}, ::Symbol) at ./Base.jl:20
[2] top-level scope at ./REPL[13]:1

What did I miss?

GBK/GFF files can contain multiple chromosomes, so readgff returns an array. There are multiple ways you can deal with this, so pick the one that suits you best. You can:

  1. use the macro @genes(chrs) to iterate over the genes from all chromosomes:
for gene in @genes(chrs)
    ...
end
  1. iterate over the chromosomes separately:
for chr in chrs
    for gene in chr.genes
        ...
    end
end
  1. store only one chromosome in chr:
chr = readgff(filepath)[1]
for gene in chr.genes
    ....
end

I recommend option 1, and if your file only contains one chromosome option 3 (@genes works on individual Chromosomes or arrays of Chromosomes, so you can combine the two).

1 Like

Thank you. I used the following code but I get unexpected output:

NbV1Ch08_missing
NbV1Ch08_missing
NbV1Ch08_missing
NbV1Ch08_missing
NbV1Ch08_missing
NbV1Ch08_missing

with the below code:

using GenomicAnnotations

chrs=readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")

for chr in chrs
    for gene in chr.genes
        println("$(chr.name)_$(gene.id)")
    end
end

How can I access the geneā€™s ids?

Thank you in advance,

Attribute names are case-sensitive, so use gene.ID, not gene.id. Judging by the partial file you posted, not all entries have an ID, so I recommend using the version with @genes that I posted earlier:

for gene in @genes(chr, !ismissing(:ID))
    println("$(chr.name)_$(gene.ID)")
end

This will iterate over the entries that do have and ID. Otherwise, for entries that lack an ID gene.ID will, again, return missing.

1 Like

Thank you but I ran into a new error:

julia> chr=readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
1-element Array{Chromosome{Gene},1}:
 Chromosome 'NbV1Ch08' (0 bp) with 136090 annotations


julia> for gene in @genes(chr, !ismissing(:ID))
           println("$(chr.name)_$(gene.ID)")
       end
ERROR: type Array has no field name
Stacktrace:
 [1] getproperty(::Array{Chromosome{Gene},1}, ::Symbol) at ./Base.jl:20
 [2] top-level scope at /Users/lorencm/.julia/packages/GenomicAnnotations/4kJOh/src/macro.jl:2

Please find here a GFF3 for one chromosome.

Thank you in advance

Your chr is still an Array{Chromosome}. In this case there is only one chromosome, so you can access the name with chr[1].name. For a more general solution, use parent(gene).name. parent(gene::Gene) returns the Chromosome that contains gene.
The following example will work for a GFF file with any number of chromosomes:

using GenomicAnnotations
chrs = readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
for gene in @genes(chrs, !ismissing(:ID))
    println("$(parent(gene).name)_$(gene.ID)")
end
1 Like

Thank you. I added a counter which lead that each feature gets a new ID rather each gene.

NbV1Ch08_g5742.t1_078390
NbV1Ch08_g5742.t1.3UTR1_078391
NbV1Ch08_g5742.t1.exon1_078392
NbV1Ch08_g5742.t1.CDS1_078393
NbV1Ch08_g5742.t1.CDS2_078394
NbV1Ch08_g5742.t1.exon2_078395
NbV1Ch08_g5742.t1.CDS3_078396
NbV1Ch08_g5742.t1.exon3_078397
NbV1Ch08_g5742.t1.CDS4_078398
NbV1Ch08_g5742.t1.exon4_078399
NbV1Ch08_g5742.t1.CDS5_078400
NbV1Ch08_g5742.t1.exon5_078401
NbV1Ch08_g5742.t1.CDS6_078402
NbV1Ch08_g5742.t1.exon6_078403
NbV1Ch08_g5742.t1.CDS7_078404
NbV1Ch08_g5742.t1.exon7_078405
NbV1Ch08_g5742.t1.CDS8_078406
NbV1Ch08_g5742.t1.exon8_078407
NbV1Ch08_g5742.t1.CDS9_078408
NbV1Ch08_g5742.t1.exon9_078409
NbV1Ch08_g5742.t1.CDS10_078410
NbV1Ch08_g5742.t1.exon10_078411
NbV1Ch08_g5742.t1.CDS11_078412
NbV1Ch08_g5742.t1.exon11_078413
NbV1Ch08_g5742.t1.CDS12_078414
NbV1Ch08_g5742.t1.exon12_078415
NbV1Ch08_g5742.t1.CDS13_078416
NbV1Ch08_g5742.t1.exon13_078417
NbV1Ch08_g5742.t1.CDS14_078418
NbV1Ch08_g5742.t1.exon14_078419
NbV1Ch08_g5742.t1.CDS15_078420
NbV1Ch08_g5742.t1.exon15_078421
NbV1Ch08_g5742.t1.5UTR1_078422
NbV1Ch08_g5742.t2_078423
NbV1Ch08_g5742.t2.3UTR1_078424
NbV1Ch08_g5742.t2.exon1_078425
NbV1Ch08_g5742.t2.CDS1_078426
NbV1Ch08_g5742.t2.CDS2_078427
NbV1Ch08_g5742.t2.exon2_078428
NbV1Ch08_g5742.t2.CDS3_078429
NbV1Ch08_g5742.t2.exon3_078430
NbV1Ch08_g5742.t2.CDS4_078431
NbV1Ch08_g5742.t2.exon4_078432
NbV1Ch08_g5742.t2.CDS5_078433
NbV1Ch08_g5742.t2.exon5_078434
NbV1Ch08_g5742.t2.CDS6_078435
NbV1Ch08_g5742.t2.exon6_078436
NbV1Ch08_g5742.t2.5UTR1_078437

I would have expected to see:

NbV1Ch08_g5742.t1_078390
NbV1Ch08_g5742.t1.3UTR1_078390
NbV1Ch08_g5742.t1.exon1_078390
NbV1Ch08_g5742.t1.CDS1_078390
...

Here is the update code:

using GenomicAnnotations
chrs = readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
for (count, gene) in enumerate(@genes(chrs, !ismissing(:ID)))
    newID = lpad(count, 6, '0')
    println("$(parent(gene).name)_$(gene.ID)_$(newID)")
    
end

Is there a better way to implement a counter?

Thank you in advance,

Something like this?

using GenomicAnnotations
chrs = readgff("/Users/lorencm/projects/bioinf-scripts/data/NbV1Ch08-augustus.hints_utr.gff3")
currentID = 0
for gene in @genes(chrs, !ismissing(:ID))
    if feature(gene) == :gene
        global currentID += 1
    end
    newID = lpad(currentID, 6, '0')
    println("$(parent(gene).name)_$(gene.ID)_$newID")
end
1 Like