Automatic Speech Recognition in Julia

I don’t find anything either in Julia. Why do you need it implemented (fully) in Julia? Isn’t it good enough to call say PyTorch from Julia and use whatever is available (or Avalon.jl then helpful)? I’m not up-to-speed on moving models from Python to Julia, i.e. just the parameters, weights and biases, shouldn’t that be possible, and wasn’t there even a standard for it ONNX? Might likely just work for certain types of networks, e.g. I believe it’s an older standard than Transformers, so those excluded?

What I did find however brand-new from 31 March 2022:

Comprehensive experiments on the LibriSpeech corpus show that the proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training, and also outperforms significantly the state-of-the-art wav2vec 2.0 and HuBERT on finetuning subsets of 10h and 100h

“Wav2Vec 2.0” was state-of-the-art in 2020, according to its paper 2020 paper, is it still so, even though this other Feb 2022, states so (or is it just an evaluation/survay paper, and they tend to repeat claims?):

If someone DOES want to reimplement something in Julia, I at least would want them to find the state-of-the-art and use that…

Might be a helpful thread:

SincNet was also intriguing when I noticed it (might be outdated, or not, hadn’t heard of SpeechBrain):

SincNet is implemented in the SpeechBrain (https://speechbrain.github.io/) project as well.

sinc (and sin) looked intriguing for periodic functions, but may actually be outdated. SIREN is if I recall newer and better, and even something more recent, even better (applications I saw however for computer vision).

I hadn’t heard of conformers (thanks for the tip), only transformers, which it’s a variant of, but might also be too old:

2 Likes