Compression: A. [Audio] codecs and logarithmic FFT and B. 6-bit letters/bitcode

Palli · August 30, 2024, 1:57pm

How does this apply to Julia? Not really except note of interest: Meta/Facebook’s recent AI speech codec is made in Julia (the prototype, which is in sync with their C implementation), and I’m not mentioning it more here, or any actual (other) Julia code.

I just find this interesting, and the FFT trivia, and my questions could be of interest to others.

Audio codecs, generally take in 16-bit linear and compress e.g. with FFT.

In the past 8-bit linear was used (very ok), and better 8-bit logarithmic (if I recall sort of like 14-bit linear, almost as good as current 16-bit; and you don’t need more 24-bit is a waste). Either way, already cut in half. So I googled “Logarithmic FFT” to see if still done, i.e. at least if the output of FFT is linear, or if not inherently by FFT (I thought not), then converted to logarithmic by some codec. But I discovered “logarithmic FFT” (LFT) in a different kind of sense.

[I don’t think (regular) FFT really cares if the input, i.e. the scalar values are logarithmic numbers or not, and you would get same numeric type out. You could put in 16-bit and get 16-bit out, then quantize to 8- (or 9- ?) bit logarithmic values. There are more steps, e.g. for masking, i.e. some values likely get truncated to 0 for further lossless compression.]

What I had in mind, were the values, not the equal spacing between samples. But I discovered the logarithmic FFT, i.e. instead of getting frequencies out, equally spaced, I guess they are no longer equally spaced with LFT. It seems it would/should help for music, but I think it’s not done, even with it being O(n), not O(n log n). So what other interesting applications of LFT?

I suppose since harmonics are integer multiples apart, the regular FFT makes sense. But could there been some intermediate between the two? Like with numbers, you have linear (integer or) fixed-point, then floats, then logarithmic numbers. Floats have a linear scale for the mantissa, and logarithmic numbers are sort of like eliminating that part of floats.

If all your notes where (the boring) sine waves, it seems logarithmic FFT would compress audio better. You can think of the note C hitting just one frequency output, and if you go up an octave (double in frequency) the next. Though you would want finer spacing, e.g. divide by 12 (at least) for all the notes in octave (usually… other systems exits). You would sort of get a “MIDI file” out (unlike for linear, then it seems not possible, way more components out). What complicates is when you do not have only sine waves (also if not hitting the notes exactly), something like a piano, with harmonics, and then linear seems better better for those (if not just storing the amplitudes, unless you rather could reference just a piano sample, each repeated note will sound the same, and very wasteful to always compress it).

https://www.mathworks.com/matlabcentral/answers/2118296-fft-of-a-frequency-sweep-using-logarithmic-spacing

There’s also variable- and:

In general: I want to know why FFT (and DCT) are used for audio (video) vs. why not some of those alternatives?

[For compression of images/video, e.g. JPEG, you have a DC component (on of the most important, not for audio, it should then always be 0, and you start at at least 20 Hz)., but then you have equally spaced frequencies, up to 7 (8x8 block) for JPEG (some codecs have 16 x 16 blocks). It probably doesn’t make sense to have logarithmic steps… because of fine details in images.]

B.
My mind was blown, seeing 6-bit letters still in modern use:

Bitcode: 6-bit characters (and variable-length starting at 4-bit integers); still in use including in Julia, because of LLVM.

https://llvm.org/docs/BitCodeFormat.html#variable-width-integers

I think this is used for LLVM’s .bc files, but unclear to me if they are used in Julia, i.e. that they ever hit the disk. Do Julia’s .ji have it inside or ever did?

I was thinking of my own string idea with 5 bits per letter or less (and an UTF-8 escape hatch), for a prefix only, and the full string still in UTF-8 (for interop). It can help for sorting and for short strings. I just didn’t think most people thought along those lines anymore, that extreme compression, even for strings, valuable.

If this bitstream encoding is only used in memory, is it for sure worth it? Encoding and decoding from it is not free, so one reason LLVM is slow? It was probably thought of worth it on disk, though I’m not even sure it still would be.

Is there code out there to read those types, or the full format, in Julia?

[I would have posted A. and B. separately (could still separate), but I post a lot to off-topic, and wasn’t even sure if too much (saw recent discussion on it), both may be marginal, and while mostly unrelated A and B are on the same theme, so for now in one post.]

I suppose some variable encoding for integers, like in B. (though base case likely 5, 6, rather than 4 bits) for amplitudes, could be used for audio, though likely redundant with other lossy compression.

Topic		Replies	Views
Logarithmic Smoothing - A Tutorial and Two Package Ideas Signal and Image Processing package , tutorials , smoothing	8	1972	November 16, 2023
Allocation-less, perhaps planned, Constant Q transform Signal and Image Processing dsp	1	216	March 16, 2024
Alternative to reading m4a files General Usage	2	887	May 9, 2020
Computing a pruned FFT in O(n*log(k)) Numerics fftw	5	766	November 6, 2022
Extracting Fourier Coefficients of an arbitrary periodic signal? New to Julia dsp	8	2486	December 2, 2020

Compression: A. [Audio] codecs and logarithmic FFT and B. 6-bit letters/bitcode

Related topics