# throws a non descriptive error
prot_alphabet=["R","H","K",
"D","E",
"S","T","N","C","Q",
"C","U","G","P",
"A","V","I","L","M","F","Y","W"]
using Flux: onehotbatch
tokenise(s) = onehotbatch(s, prot_alphabet)
prot_seq="GAQLLNYASYFAKMAIKLDRKG"
tokenise(prot_seq)
vs
# gives a nice OneHotMatrix
prot_alphabet=['R','H','K',
'D','E',
'S','T','N','C','Q',
'C','U','G','P',
'A','V','I','L','M','F','Y','W']
using Flux: onehotbatch
tokenise(s) = onehotbatch(s, prot_alphabet)
prot_seq="GAQLLNYASYFAKMAIKLDRKG"
tokenise(prot_seq)
its the string vs char of course
but what to do if I want to hotone encode something where the labels are not single chars but longer strings