Distributed computing in the Windows Subsystem for Linux

I have noticed that running Distributed-enabled computations in the WSL is very much sub-optimal.
I have a fast Windows 10 machine with 16 cores (which runs the simulation in the WSL), and when running a single worker, it is about 1.75 times faster than a Linux machine I use for comparison. When running with 4 workers, the fast Windows machine is 1.5 times slower than the Linux machine.

I have also tried running the Distributed computations under Windows directly (i. e. with a Windows Julia), and the fast Windows machine is now indeed ~1.7 times faster than the Linux machine both for single worker and for multiple workers.

It would appear to me that the WSL is somehow crippled for the use of multiple cores. Could anyone verify that or contribute some knowledge as to this aspect?

1 Like

That’s a darned good question. I have a windows 10 laptop and WSL, but I have not used it much. I have not given much though to how it works. It is not a Virtual Machine - instead there is some sort of kernel compatibility layer - ie the Linux syscalls are emulated (or something) and are run by Windows. So maybe the emulation is where a slowdown is happening.

This might be useful

1 Like

Maybe you could cross check using a cygwin linux on your windows 10 machine. If you still get lower performace in cygwin julia … no, I just realize under cygwin you would run a native windows julia and not a linux julia, so you would propably get the same performance as directly from windows 10. Despite that I let this idea here for others if it isn’t as dumb as I think it is now :grin:

1 Like

Seems like a fine idea to me. Unfortunately I do not have cygwin installed, and no time to do so.

1 Like

For a laugh, I started up WSL and ran apt install Julia
It installs version 0.4.5 :frowning:
The Ubuntu release is 16.04 LTS

On a physical Ubuntu server with 18.10/Cosmic Julia with apt is version 1.0.1

Microsoft/Ubuntu guys - seriously? I know why Long Term Support versions.
But many people just use the distro supplied software packages.
And before anyone says, Julia is very easy to install even without root privileges.

I think WSL is a single core (sub)system

What made you think this?

WSL:

julia> peakflops(10000)
3.128249420206342e11

Windows:

julia> peakflops(10000)
3.235946865234721e11

Don’t remember now but it was from experiments running a program (very likely GMT) that was doing single core only in WSL.

However lscpu clearly tells me that I have 8 CPU(s)

My setup reports also several CPUs:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel® Xeon® CPU E5-2670 0 @ 2.60GHz
Stepping: 7
CPU MHz: 2601.000
CPU max MHz: 2601.0000
BogoMIPS: 5202.00
Virtualization: VT-x
Hypervisor vendor: Windows Subsystem for Linux
Virtualization type: container
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave osxsave avx

WSL is no emulation, it’s (with a short, incomplete picture) “just” a translation from Linux Syscalls & process management to NT kernel syscalls and process management.

There’s more info about how the translation works here. It’s part 3 of a series of blog posts about how the internals work, specifically syscalls.

1 Like

What is the output of julia> Sys.cpu_info()
and cat /proc/cpuinfo?

Apparently WSL does not support multiple sockets. Nevertheless, in theory, you should have 8 cores available from one of the sockets with that system. Perhaps you are hitting this issue below and getting pinned to a single core in a similar manner:

Nice link. It helped me understand why while in the Linux WSL terminal ^z crashes Julia with “futex error” upon resume.

I got

julia> Sys.cpu_info()
32-element Array{Base.Sys.CPUinfo,1}:
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz    1068306 s          0 s      24892 s     820290 s       2240 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     493720 s          0 s       4321 s    1415446 s          4 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz    1108003 s          0 s       8535 s     796950 s          6 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     453254 s          0 s       5090 s    1455140 s          6 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz    1142401 s          0 s       7731 s     763356 s        304 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     419200 s          0 s       4437 s    1489851 s          7 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz    1079481 s          0 s       7817 s     826190 s          3 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     471112 s          0 s       3532 s    1438843 s          4 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     619456 s          0 s      10312 s    1283720 s          4 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     946284 s          0 s       5709 s     961496 s          6 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     661068 s          0 s       6282 s    1246137 s        156 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     881051 s          0 s       4345 s    1028092 s          3 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz    1012118 s          0 s      57859 s     843510 s         90 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     574282 s          0 s       1556 s    1337650 s          1 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     760537 s          0 s      15771 s    1137179 s         54 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz     804298 s          0 s      17753 s    1091439 s          7 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz      12601 s          0 s       1348 s    1899539 s          3 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz      30065 s          0 s       2532 s    1880889 s          0 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz      20178 s          0 s       1634 s    1891673 s          1 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz      21364 s          0 s       2509 s    1889612 s          7 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz      44379 s          0 s       2365 s    1866742 s          1 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz         76 s          0 s         43 s    1913368 s          0 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz      34889 s          0 s        723 s    1877875 s          3 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz        331 s          0 s        546 s    1912609 s         10 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz      18600 s          0 s        434 s    1894451 s          1 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz       1939 s          0 s        871 s    1910676 s         70 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz       2001 s          0 s        471 s    1911015 s         81 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz        232 s          0 s        193 s    1913062 s          0 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz      28503 s          0 s       3262 s    1881720 s          0 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz        375 s          0 s        370 s    1912743 s          4 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz      30273 s          0 s       2692 s    1880520 s          4 s
        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
        speed         user         nice          sys         idle          irq
     2601 MHz        298 s          0 s      21481 s    1891707 s         12 s

julia>






























And

cat /proc/cpuinfo
...
processor       : 31
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      :        Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping        : 7
microcode       : 0xffffffff
cpu MHz         : 2601.000
cache size      : 256 KB
physical id     : 1
siblings        : 16
core id         : 7
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave osxsave avx
bogomips        : 5202.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

I don’t know what to think. I followed the MPI run link, and dug down some more. It is an issue around two years old, so it might have gotten addressed. At least one person reports having been able to use all sockets from WSL.

Talking about WSL, Microsoft have announced WSL 2 which has a real Linux kernel

I have no idea how this works, and it seems to be in the technology preview at the moment.
I don’t think I m brave enough to enable this on my laptop!

1 Like