I have noticed that running Distributed-enabled computations in the WSL is very much sub-optimal.
I have a fast Windows 10 machine with 16 cores (which runs the simulation in the WSL), and when running a single worker, it is about 1.75 times faster than a Linux machine I use for comparison. When running with 4 workers, the fast Windows machine is 1.5 times slower than the Linux machine.
I have also tried running the Distributed computations under Windows directly (i. e. with a Windows Julia), and the fast Windows machine is now indeed ~1.7 times faster than the Linux machine both for single worker and for multiple workers.
It would appear to me that the WSL is somehow crippled for the use of multiple cores. Could anyone verify that or contribute some knowledge as to this aspect?
That’s a darned good question. I have a windows 10 laptop and WSL, but I have not used it much. I have not given much though to how it works. It is not a Virtual Machine - instead there is some sort of kernel compatibility layer - ie the Linux syscalls are emulated (or something) and are run by Windows. So maybe the emulation is where a slowdown is happening.
Maybe you could cross check using a cygwin linux on your windows 10 machine. If you still get lower performace in cygwin julia … no, I just realize under cygwin you would run a native windows julia and not a linux julia, so you would propably get the same performance as directly from windows 10. Despite that I let this idea here for others if it isn’t as dumb as I think it is now
For a laugh, I started up WSL and ran apt install Julia
It installs version 0.4.5
The Ubuntu release is 16.04 LTS
On a physical Ubuntu server with 18.10/Cosmic Julia with apt is version 1.0.1
Microsoft/Ubuntu guys - seriously? I know why Long Term Support versions.
But many people just use the distro supplied software packages.
And before anyone says, Julia is very easy to install even without root privileges.
WSL is no emulation, it’s (with a short, incomplete picture) “just” a translation from Linux Syscalls & process management to NT kernel syscalls and process management.
There’s more info about how the translation works here. It’s part 3 of a series of blog posts about how the internals work, specifically syscalls.
What is the output of julia> Sys.cpu_info()
and cat /proc/cpuinfo?
Apparently WSL does not support multiple sockets. Nevertheless, in theory, you should have 8 cores available from one of the sockets with that system. Perhaps you are hitting this issue below and getting pinned to a single core in a similar manner:
julia> Sys.cpu_info()
32-element Array{Base.Sys.CPUinfo,1}:
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 1068306 s 0 s 24892 s 820290 s 2240 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 493720 s 0 s 4321 s 1415446 s 4 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 1108003 s 0 s 8535 s 796950 s 6 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 453254 s 0 s 5090 s 1455140 s 6 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 1142401 s 0 s 7731 s 763356 s 304 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 419200 s 0 s 4437 s 1489851 s 7 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 1079481 s 0 s 7817 s 826190 s 3 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 471112 s 0 s 3532 s 1438843 s 4 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 619456 s 0 s 10312 s 1283720 s 4 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 946284 s 0 s 5709 s 961496 s 6 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 661068 s 0 s 6282 s 1246137 s 156 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 881051 s 0 s 4345 s 1028092 s 3 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 1012118 s 0 s 57859 s 843510 s 90 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 574282 s 0 s 1556 s 1337650 s 1 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 760537 s 0 s 15771 s 1137179 s 54 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 804298 s 0 s 17753 s 1091439 s 7 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 12601 s 0 s 1348 s 1899539 s 3 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 30065 s 0 s 2532 s 1880889 s 0 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 20178 s 0 s 1634 s 1891673 s 1 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 21364 s 0 s 2509 s 1889612 s 7 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 44379 s 0 s 2365 s 1866742 s 1 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 76 s 0 s 43 s 1913368 s 0 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 34889 s 0 s 723 s 1877875 s 3 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 331 s 0 s 546 s 1912609 s 10 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 18600 s 0 s 434 s 1894451 s 1 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 1939 s 0 s 871 s 1910676 s 70 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 2001 s 0 s 471 s 1911015 s 81 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 232 s 0 s 193 s 1913062 s 0 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 28503 s 0 s 3262 s 1881720 s 0 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 375 s 0 s 370 s 1912743 s 4 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 30273 s 0 s 2692 s 1880520 s 4 s
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:
speed user nice sys idle irq
2601 MHz 298 s 0 s 21481 s 1891707 s 12 s
julia>
And
cat /proc/cpuinfo
...
processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping : 7
microcode : 0xffffffff
cpu MHz : 2601.000
cache size : 256 KB
physical id : 1
siblings : 16
core id : 7
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave osxsave avx
bogomips : 5202.00
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
I don’t know what to think. I followed the MPI run link, and dug down some more. It is an issue around two years old, so it might have gotten addressed. At least one person reports having been able to use all sockets from WSL.
Talking about WSL, Microsoft have announced WSL 2 which has a real Linux kernel
I have no idea how this works, and it seems to be in the technology preview at the moment.
I don’t think I m brave enough to enable this on my laptop!