Best practices for Speech-to-Text conversion?

themadprogramer · April 14, 2020, 4:03am

I’m working on a project where I plan to convert speech into text to perform very basic NLP tasks.

At first, I hoped to find a package in Julia for Speech-to-Text conversion myself, but that didn’t turn up anything.

Next I considered Google’s Cloud Speech-to-Text – Speech Recognition API, but I didn’t like that it had to be online at all times (the program I’m making should at least have minimal functionality even if an internet connection is unavailable), then DeepSpeech, which was going to require some non-Julia to set up.

And just a while ago I noticed that most operating systems now have built-in dictation, is it then a good idea to make something like a voice recording and pass it to the OS’s dictation software?

Which of these approaches have you used before or which would you recommend? Do you have any other ideas?

Oscar_Smith · April 14, 2020, 4:17am

GitHub - buriburisuri/speech-to-text-wavenet: Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow has a python version of this feature using tensorflow. You might be able to pretty easily transfer it to Julia and use one of their pre-trained nets.

themadprogramer · April 14, 2020, 4:20am

While I have considered training a model myself, due to current circumstances I need to limit my need for hardware and thereby hardware-intensive training. I’ll still be looking into it though, thank you.

Oscar_Smith · April 14, 2020, 4:22am

Part of my suggestion was that the repository has pretrained nets, that if you are willing to put in a bit of work to load, would keep you from having to retrain a network.

themadprogramer · April 14, 2020, 4:25am

Sorry, that last part seems to have slipped my mind

WaveNet can still be pretty slow even at runtime, but I might consider it if I don’t have any other options.

I did happen to find a WaveNet implementation using Flux I think? I could work off from there, but again I have concerns that my poor potato won’t be able to run it at all.

jakewilliami · June 23, 2021, 6:29am

@themadprogramer Did you have any luck with implementing any of this?

Topic		Replies	Views
Automatic Speech Recognition in Julia Machine Learning question	2	1077	April 5, 2022
Text To Speech in Julia New to Julia text-to-speech , tts	4	1203	February 27, 2025
AI: STT, TTS and PromptingTools General Usage ai	7	348	April 8, 2025
Speech-based Emotion Recognition Machine Learning question	1	851	April 19, 2020
Announcing Whisper.jl Package Announcements machine-learning , audio , speech-recognition	7	2151	May 23, 2023

Best practices for Speech-to-Text conversion?

Related topics