I’m working on a project where I plan to convert speech into text to perform very basic NLP tasks.
At first, I hoped to find a package in Julia for Speech-to-Text conversion myself, but that didn’t turn up anything.
Next I considered Google’s Cloud Speech-to-Text – Speech Recognition API, but I didn’t like that it had to be online at all times (the program I’m making should at least have minimal functionality even if an internet connection is unavailable), then DeepSpeech, which was going to require some non-Julia to set up.
And just a while ago I noticed that most operating systems now have built-in dictation, is it then a good idea to make something like a voice recording and pass it to the OS’s dictation software?
Which of these approaches have you used before or which would you recommend? Do you have any other ideas?