For strongly memory bound problems (like your previous post on array sum) and on a classical Desktop you may have some speed-up (x2-4) if you have several (2-4) memory channels. It is usually not the case on a laptop. Actually a floating point division takes some cycles and multithreading could be interesting.
In any case, I understood (from the previous link) that multithreading is not implicit for broadcast (which is a good think because it is difficult to anticipate if your code will be nested in another multithreaded context).
A possible solution if you have a strong interest in defining A and B separately, would be to fuse the div operation with the algorithm that uses V. A lazy definition of V could be nice
The curve corresponding to a MT+simd version your sum:
And the corresponding snippet:
total = zero(eltype(a))
Threads.@threads for c=1:nchunk
stotal = zero(eltype(a))
@simd for i=imin:imax
@inbounds stotal += a[i]