Announcing Whisper.jl

I’m very pleased to announce the availability of Whisper.jl, a Julia package to perform speech recognition. It uses the Whisper model developed by OpenAI, and runs the inference on the CPU using Georgi Gerganov’s whisper.cpp. That C library is provided as a jll, and the model weights are downloaded on demand.

Currently, a transcribe function is exposed that takes raw audio data, and produces a text transcription. Suggestions, or contributions, for other kinds of interfaces will be gratefully received. The whisper.cpp low level functions are, of course, already available. But a high level Julia streaming interface does need to be added – hopefully soon.

The package is awaiting registration in the General Registry. Please try it out and let me know how it goes.

Regards

Avik

33 Likes

Cool! If you need it tested on a Scottish accent let me know…

8 Likes

Great! I see for:

There are five model sizes, four with English-only versions.

What languages/model do you support (large-v2 model?), e.g. my native Icelandic? I see best word-error rate (WER) 3.2% for Spanish, then Italian, then English at 4.2%, and way down Icelandic at 38.2%, and Nepali last.

State-of-the-art is however now:

Makes “43% fewer errors on noisy data on average”, and 3.3% WER (for English presumably on non-noisy, “human transcriptionists” get 4%; Conformer-1 gets 9.9% on noisy data):

Very cool! Any plans to bring the model inference code into Julia natively? Reading through the whisper.cpp code it’s mainly just in two C++ source files and most of it is doing stuff that would be easier/cleaner in Julia (i.e. there are currently separate functions for different input data types which could be moved to multiple dispatch).

2 Likes

That would indeed be very cool, and I think is quite feasible. However, that is beyond the limits of my time and skills currently. I hope someone does this, and I’ll be happy to retire the current ccall based codebase.

Hi @avik! Sharing a Genie app we built with Whisper.jl. Have a look!
whisper genie app - genie cloud

The code is available here: https://github.com/GenieFramework/GenieFrameworkDemos/tree/main/Whisper

13 Likes

Cool! Thanks for sharing.

Just gave it a try and it works quite well.
I was surprised that the result was automatically translated into an english result from german reading. So I tried to figure out how to change some parameters perhaps to get a german text, but failed to find a solution. It’s not so easy by just looking into the C interface. Some hints or explanations on this would be helpfull. Not too deep but a bit to find a starting point.