Coloring of specific characters in a String

I work with RNA and DNA sequences and am interested in determining how to change a particular character in a string to a different color.

Specifically, I’d like to

sequence = "AUCGCGCA"

sequence[3] = #Blue copy of C

Any suggestions are welcome?

Jim

There’s printstyled(), and Crayons.jl for terminals, but it would be nice to have a StyledString type that could be rendered in any display.

1 Like

This is not a full solution, but if colours are problematic you can use a different font for special letters. Julia has full unicode, co there are few quite distinct fonts which could serve as letter highlighting.

Can you give a bit more information about just what you are trying to accomplish?

Do you only need to display ASCII characters?
Is this only to display a string from your own code, annotated with colors?

Are there only a few colors that you need to use?

One possibilty would be to create a vector for the colors for each character you wish to display,
and write a function that displays a string with the colors.

The API could be something like the following:

using ColoredSequences
cs = ColoredSequence(sequence)
setcolor!(cs, 3, CS_BLUE)
println(cs)
module ColoredSequences

export ColoredSequence, setcolor!

const CS_NONE  = 0x0
const CS_RED   = 0x1
const CS_GREEN = 0x2
const CS_BLUE  = 0x3

export CS_NONE, CS_RED, CS_GREEN, CS_BLUE

struct ColoredSequence{T<:AbstractString}
    s::T
    cv::Vector{UInt8}
end

ColoredSequence(s) = ColoredSequence(s, fill(CS_NONE, length(s)))

setcolor!(cs, loc, color) = (cs.cv[loc] = color)

const colors = [:normal, :red, :green, :blue]

function findnextneq(vec, loc, val)
    len = length(vec)
    while loc <= len
        vec[loc] != val && return loc
        loc += 1
    end
    loc
end

function Base.show(io::IO, cs::ColoredSequence)
    str = cs.s
    cv  = cs.cv
    len = sizeof(cv)
    prv = 1
    while prv < len
        prevc = cv[prv]
        loc = findnextneq(cv, prv, prevc)
        printstyled(io, SubString(str, prv, loc-1), color = colors[prevc+1])
        prv = loc
    end
end

end # module ColoredSequences

This is a quick-and-dirty implementation, it assumes that you are only using strings that do not have any multi-codeunit characters in it (i.e. all ASCII, for String, and UTF8Str, for example, or all < 65536 for UTF16Str)
Hope this helps!

1 Like

Thanks for asking. So mostly I want to be able to clearly visualize differences between sequences. A sequence can be a rather long combination of ‘T’, ‘C’, ‘G’, and ‘A’ characters. So it can be hard to pick out which character is different when comparing two sequences. I was hoping I could use a color to highlight the individual characters that are different. Maybe a different color for each of the four nucleotides noted above. I’d use this in a jupyter or Pluto notebook mostly.

Sweet. Let me give this a go. Thanks so much!

I’m happy to help!
I noticed that it wasn’t all that easy to see the color for letters (at least, on my monitor, with a white background).
It might be better to cause the highlighted letters to be inverted (i.e. colored background, with black or white text), to stand out more.

module ColoredSequences

export ColoredSequence

const CS_NONE  = 0x0
const CS_RED   = 0x1
const CS_GREEN = 0x2
const CS_BLUE  = 0x3
const CS_CYAN  = 0x4
const CS_MAGENTA = 0x5
const CS_YELLOW  = 0x6

export CS_NONE, CS_RED, CS_GREEN, CS_BLUE, CS_CYAN, CS_MAGENTA, CS_YELLOW

struct ColoredSequence{T<:AbstractString}
    s::T
    cv::Vector{UInt8}
end

ColoredSequence(s) = ColoredSequence(s, fill(CS_NONE, length(s)))

Base.setindex!(cs, color, ind) = (cs.cv[ind] = color)

const colors = [:normal, :red, :green, :blue, :cyan, :magenta, :yellow]

function findnextneq(vec, loc, val)
    len = length(vec)
    while loc <= len
        vec[loc] != val && return loc
        loc += 1
    end
    loc
end

function Base.show(io::IO, cs::ColoredSequence)
    str = cs.s
    cv  = cs.cv
    len = sizeof(cv)
    prv = 1
    while prv <= len
        prevc = cv[prv]
        loc = findnextneq(cv, prv, prevc)
        printstyled(io, SubString(str, prv, loc-1);
                    color = colors[prevc + 1], reverse = (prevc != CS_NONE))
        prv = loc
    end
end

end # module ColoredSequences

I changed it to use reverse for displaying colors, it’s much more visible, fixed an off-by-one bug, and added a few more colors

1 Like

Nice, I might try to extend this a little - make a couple options for background or character color, and maybe font size or typeface.

Check out the CSS font properties for reference.

This is a side note, but as you are working with DNA I would highly recommend using package BioSequences. You get containers for various biological sequences with many tools for analysis and manipulation. I don’t think there is colour highlighting there, but you can add it the same way as shown above.

1 Like

Yeah, I’ve looked at that some. Thanks for the direction

JB