pmap is using interprocess communication 1 million times to execute the function. That’s a lot of overhead relative to a vectorized function call. Why would you expect it to be as fast? It’s intended to be used for executing functions with non-trivial run-time.
I suppose pmap could be designed to fall back to regular map when no workers are registered, but what’s the point of calling pmap without any workers?