Why writing array to socket is slow in Julia Version 1.3.1?

We recently updated our Julia (finally!) to 1.3.1 from 0.6, and I have been updating my code.
In my code I am writing a numpy python array to a socket as outlined below

using NPZ
using Sockets

server = listen(8006)
sock = accept(server)
@async while isopen(sock)
    receiveddata = NPZ.npzreadarray(sock)
    processeddata = ProcessFunction(receiveddata)
    NPZ.npzwritearray(sock, processeddata)

On the client side of the 8006 socket, I have a python code which uses socket.socket() python object to send the numpy array to the Juli code, and also receive using the .recv(1024) size buffer in a loop.

This used to work well and used to be very fast in old verison of Julia.
However, in the Julia version 1.3.1 (while receiving the array is fast), sending the array to python is very very slow.
For sending an array of size 2x1024x512 float64, it takes about 1:06 minutes, and scales proportional to the size of the array.
Again I note, this delay is only while writing the array from Julia into the socket, reading is fast.

I verified NPZ.write is not slow for writing to a normal file IO. The issue seems to be only when writing to a socket.

What could be happening here?

Would it be an alternative for you to use PyCall.jl (+ potentially PyJulia on Python side) to transfer the Numpy arrays to Julia?
This is very simple and very fast (zero copy), even for large data amounts.
I could give you an example if you are interested.

Thanks!, I am open to better ways to do this. Will using PyCall be better?
From my quick read of the PyCall documentation, I couldn’t find how to read in a numpy array from socket. We have a large python code which does lots of stuffs to the data, and only for applying one computationally expensive function, I need to pass that numpy array to a Julia server, which will process it and return the array back. The initialization of the coefficients needed in the Julia server to process this data is time-consuming. Which is why I cannot trivially call a standalone Julia function from python.
Am I making sense?

Would it be an option for you to run Python and Julia on the same machine?
If yes, you can first call with PyJulia the Julia initialization functions once (they are evaluated in global scope of a Julia process, which persists as long as long as you don’t close Python / PyJulia) and then call Julia functions from Python where needed.

Yes. That will work for most of our cases, they both can run on the same node of the cluster. Is it possible to let the Julia process initialize in background while the python code runs? (I read PyJulia does not release the GIL).
Also, will the parallelization in Julia will work fine, even if we call the function via PyJulia?
Pointers to documentation/examples on this will be very helpful.

By the way, I would also in parallel like to solve this Socket issue, since this enables us to reuse this Julia module code for many other situations, and interacting via sockets makes it easy.

You could call Julia code from Python, which itself starts async tasks (not tested) or is multithreaded (tested).
However, all calls to Julia should be from the main Python thread, otherwise you probably run into issues.