Why does training this model use as many CPUs as I ask it to:

```
Chain(
Conv((5, 5), imgsize[end]=>6, relu),
MaxPool((2, 2)),
Conv((5, 5), 6=>16, relu),
MaxPool((2, 2)),
flatten,
Dense(prod(out_conv_size), 120, relu),
Dense(120, 84, relu),
Dense(84, nclasses)
)
```

And training this model use only 1?

```
Chain(
#28x28 to 14x14
Conv((5,5), 1=>8, pad = 2, stride = 2, relu),
#14x14 to 7x7
Conv((3,3), 8=>16, pad = 1, stride = 2, relu),
#7x7 to 4x4
Conv((3,3), 16=>32, pad = 1, stride = 2, relu),
#Average pooling on each width x height feature map
GlobalMeanPool(),
Flux.flatten,
Dense(32,10),
softmax
)
```

I paste both into the MNIST example from the Flux model zoo so all else should be equal.

Iâ€™m sure there is a reason; Iâ€™m not sure what it is