Flux multiple cores

Hi All, Apologies for this very basic question, but most of the threads on this are old, and this topic (and related capability) seems to be evolving.

I have been running Flux on an 8-core machine, but I only just realised I have been only using 1 core. After reading the docs, I set the number of threads to 8 and restarted, and verified that Thread.numthreads() gives 8

When I reran some training code, though, I didn’t see any noticeable difference in speed.

I read further and saw that (at least at one stage of development) Flux only used cores via something called BLAS. Do I need to explicitly set another environment variable in that case? Or perhaps there is something else I can set to avail myself of the multi-core performance?

Thanks in advance for any help.

PS I didn’t do straight benchmarking, and the latest run was somewhat faster, so I may be mistaken. Anyway, am I doing everything I can to ensure speed?