Thank you @kevbonham, I agree! I’ll happily answer any questions about GenomicAnnotations, but it’s very difficult when the target keeps changing, or when the questions aren’t actually about the package.
@mictadlo, if you want to change the name of each chromosome you can iterate over the chromosomes:
for (i, chr) in enumerate(chrs)
chr.name = string(i)
end
Obviously, this assumes that chrs
is an Array{Chromosome}
, and that the chromosomes are stored in the order that they should be numbered, which you will have to verify yourself. Iterating over the chromosomes is one way to solve your problem with currentID
, as well.
Regarding the documentation, I think adding an example section is a good idea so I’ll look into that.
If it increases performance it would certainly be welcome. I haven’t encountered any problems with the current implementation in some time now, though, so I don’t think it has high priority.
Thank you for your solution. I updated the code and it does what I wanted. However, I just wonder whether there would be better to do it:
using GenomicAnnotations
chrs = readgff("/Users/lorencm/chr1-2.augustus.hints_utr.gff3")
currentID = 0
currentChr = ""
for gene in @genes(chrs)
if currentChr != parent(gene).name
global currentChr = parent(gene).name
global currentID = 0
end
if feature(gene) == :gene
global currentID += 1
end
newID = lpad(currentID, 6, '0')
chrID = SubString(parent(gene).name, 7)
if !ismissing(gene.ID)
geneIDextension = ""
splitCurrentID = split(gene.ID,".",limit=2)
if length(splitCurrentID) == 2
println(splitCurrentID)
geneIDextension="."*splitCurrentID[2]
println(geneIDextension)
end
gene.ID = "$chrID$newID$geneIDextension"
println(gene.ID)
end
if !ismissing(gene.Parent)
println("HeLLO")
println(gene.Parent)
println(parent(gene).name)
geneIDextension = ""
splitCurrentID = split(gene.Parent,".",limit=2)
if length(splitCurrentID) == 2
println(splitCurrentID)
geneIDextension="."*splitCurrentID[2]
println(geneIDextension)
end
println("$(parent(gene).name)$newID")
gene.Parent = "$chrID$newID$geneIDextension"
println(gene.Parent)
#println("!!Parent",parent(gene),parent(gene).name)
end
end
open("updated.gbk", "w") do f
printgff(f, chrs)
end
Thank you in advance
“Better” is quite subjective, and there are a lot of ways to think about it. Do you mean more performant, more clear, more idiomatic? Sometimes there are style things that just come down to personal preference.
For performance reasons, I think it would be better to wrap that all in a function and avoid all the global
s.
Style-wise, I prefer using logging functionality (like @info
) instead of print statements. I also like using short-circuit boolean operators instead of short if
statements, eg
feature(gene) == :gene && (global currentID += 1)
instead of
if feature(gene) == :gene
global currentID += 1
end
and use continue
instead of a large indented if
block, eg
ismissing(gene.Parent) && continue
# stuff...
instead of
if !ismissing(gene.Parent)
# stuff....
end
though your way is arguably clearer. Keep trying different stuff, and see what you like. In the majority of code that I write, clarity and correctness is far more important than performance or some notion of “proper” code.
2 Likes