Scaled FFT implementation seems to only use one thread despite setting no. of threads to 8

Great thank you for the suggestions I’ll implement them. I’ll also post the profiler breakdown.