Is back propagation in Flux alway run on a single cpu core?

Hi, I am trainning a neural network with Flux on a machine with 56 cpu cores, but I found that it seems it is alway only one core working while Flux.train! is performed, and I can see multi cpu is running in the forward caculation step of my model.

so, is it normal or anything I missing?