Building a PC optimized for "time to first plot"

20 KB is just for a small Hello world! demo (anything with more code is going to be that much larger). Similar Julia demo has run on microcontroller with only 2 KB of RAM, so small is possible (and also small Julia code running in the Linux kernel).

PackageCompiler.jl is likely the go-to solution for now (unless with Python), it supports everything (with some rare exceptions, all packages should work). It’s used in production by some (I’m not sure if using non-defaults, Chris should know). The executables are huge, but can be made non-huge by modern standards, with non-default tricks. Even as is workable for many, just as Julia’s runtime is very workable.

That will unfortunately create more discrimination on what should be counted as a good package. There’s simply no magic behind that tech. You simply compile your codes with no dynamic calls, no type calculations and no GC, which requires Julia’s runtime to provide necessary definitions. To say that it can’t handle strange things is the synonym for absence of these runtime symbols in the final products. But generally, this is hard to guarantee, and some of them are harder than others, which necessitates static checkers.

Now just image one day, package developers are forced to modify their codes, so they can be better precompiled or compiled (I am not joking, since the plan for such static checkers is on its way, like type piracy and method invalidation checker). At least we currently have effect checker, it would be easy to image that this checker is extended to cover other kinds of effects and rule out those “broken codes”. Precompilation already does this by classifying those packages that are unable to pass precompilation as non-working. The overall effect is that those dynamic codes will purged out of the community.

At least, this is a good thing for me, since I am interested in developing static checker for Julia and this creates a lot of opportunities for me. But for a general user?

Crazy reply from me (as usual). The most important thing as @ChrisRackauckas says is to understand your environment.
I was going to weigh in and say that rather than concentrating on CPU choice always look at storage performance - which points at using NVME drives these days.
However - how about using a RAMDISK for Julia projects? Of course everything disappears in a puff of smoke when you log out. Has anyone actually tried this?
Note to self - if you are so darned smartarsed try it yourself.

Hmmm - I know Optane may not be fashionable - but Optane is fast and persistent over reboots AFAIK.
Maybe a super fast Julia workstation would use Optane and then a DAOS filesystem

OP already shared a benchmark midway through this thread. The bottle neck for compiling is not disk performance, it is RAM bandwidth and latency. Which is not too surprising - by the time the code is being compiled, it has already been read from the disk and is already in RAM.

does this imply that the best TTFP optimized pc is a mac ? :slight_smile:

I would say that is a bit misleading phrasing, or an XY problem. For any language, a compilation-focused hardware would be one that has high memory bandwidth. Today such hardware is the Apple systems on chip or AMD’s 3D V-cache. But the actual solution of TTFX in Julia is to not have to recompile as much. How to execute that solution is described throughout this thread, including caveats and prospects for further improvement in the near future.

Importantly, today it is possible to have zero latency Julia with no TTFX problems for the vast majority of the libraries, with a bit of customization in your workflow. That is being automated and simplified so that you do not need the customization in the near future.

For example, for me, importing and plotting with Makie takes a second, which is on par with Python for time to first plot. Makie also happens to be many orders of magnitude faster than Python for time to second plot.

6 Likes

My M1 Mac Mini compiles Julia much faster than any of my Intel machines (my most modern Intel machine is the 1165G7, a 4 core Tiger Lake laptop).

4 Likes

I have the same experience with my M1 max machine. The performance is amazing (I switch very frequently on PC because this is the main target for most of my clients and the difference is quite important). Considering the energy consumption it is even more dramatic. This being said I desperately wait for the competition to catch up because I want to go back to linux !

Anyway @Krastanov is of course right and the solution to OP’s problem is not a material upgrade :yum:

1 Like

Why not? FYI: https://www.youtube.com/watch?v=UAG5Q8Hovkw

Linus Torvalds, the creator of the Linux operating system, has ditched x86 for road trips and instead is using a MacBook with Apple Silicon, but not running MacOS, but running Linux!

And (some pros and cons at that timestamp, maybe some of the cons already fixed?): https://youtu.be/TlN3DH33MdE?t=406

1 Like

I am not convinced at all.

I discovered with a great and genuine surprise that Apple (it’s my first Apple machine after more than 30 years of dev) was not interested at all in (scientific) developers who wanted a minimum of portability for their software. The ecosystem for gpgpu (metal) development is light years nehind from what NVidia (CUDA) or Intel (oneAPI) offers, not to mention the dedicated units like AMX matrix coprocessor that some experts discover by chance and that are not documented. When I see the efforts made by NVidia and Intel to support development on their hardware in comparison, I understand the consequences of vertical integration (software+hardware) of companies like Apple. Anyway, this seems to be a very profitable strategy and we can fear that Microsoft will follow the same path by developing their own chip.

In this context, I’m afraid that heroic reverse engineering efforts to develop Linux on these closed architectures are becoming more and more futile. Look what effort was put to make Julia available on this system (>2 years!) The only hope for Linux in my opinion is that non-OS hardware vendors survive this market reconfiguration.

OK, I may be objected that nothing forces me to leave the x86-Linux world, but that’s where the depth of the technology gap that Apple has created comes in.
Compiling code in parallel all day on 8 cores without the slightest noise and much faster than the competition on… a laptop… it’s incredible and it’s hard to resist. Another argument is energy efficiency, which seems to me to be a crucial issue when we see the growing importance of mass computing on electricity consumption. The worst thing would of course be that the competition would compensate for its technological deficit by proposing machines that consume more and more energy: fortunately, in a context of climate change, this worst case scenario cannot be considered :slight_smile:

In this context (again) I am desperately waiting for the hardware specialists to catch up and exceed the (energy) performance of the generalist companies.

1 Like

To be fair, a couple of clicks + O(5) minutes for creation + O(500) MB disk space. But I’m looking forward to the improvements that are in the pipeline.

2 Likes

Disclaimer(I own an m1 laptop and I like it very much). The time it took to make julia work on the M1 is probably the same that it would have taken to add any other OS/arch (RISC-V will probably be on a similar situation when it gets competitive). While Apple does tend to get in the way of a lot of things, it does seem to be they aren’t getting in the way of porting linux to the M1, in fact it seems they’ve let the door open.

2 Likes

First thank you again for your great contributions to make Julia smooth on M1 (it works super fast now).
The issue, which I think is not crucial for Linus Torvalds on road trip, is to be able to actually use the machine at its full potential. For example, M1 is able to perform gemm at incredible speed but it will consume tons of effort to access it easily within Julia because Apple do not even care to provide a standard BLAS API to this operation (I think it would be very easy and cheap for them to do so) and just present Metal as a solution… The same applies to GPGPU. It maybe unimportant for big editor like Adobe, but for scientific developers it is just a huge waste of time. I am super impressed and happy with the work aiming at unifying GPGPU within Julia, but I can’t help to consider the quantity of work that could be avoided if Apple bother to provide open APIs (SYCL, Vulkans,…).

So apple may let the door open for experimental Linux distribution but I would prefer that they consider scientific computing seriously.

3 Likes

I was thinking the same as you a few weeks ago, so I bought a new laptop (I’m on the move a lot) and picked it mostly on single core benchmark scores. The i7-1255U (12th gen aka Alder Lake) has a Single Thread Rating of 3386 according to cpubenchmark.net. I’ve compared the laptop with my older laptop and this rating was in-line with some Julia compilation benchmarks that I did.

Output for your benchmark:

$ time julia --startup-file=no --project acausal.jl

________________________________________________________
Executed in   67.35 secs    fish           external
   usr time   65.70 secs    1.30 millis   65.70 secs
   sys time    2.78 secs    0.16 millis    2.78 secs

It’s about ((60 + 55) - 67.33) / (60 + 55) = 41\% faster. Possibly a bit less if you use --startup-file=no.

EDIT: More generic benchmark too:

$ git clone https://github.com/JuliaLang/julia
[...]

$ git reset --hard e569459da614c63d334f87198ffbb390d76ce515
HEAD is now at e569459da6 [MozillaCACerts_jll] Update to 2022-10-11 (#47175)

$ make # Warmup caches.
[...]

$ make clean
[...]

$ time make
[...]

real    7m13.559s
user    6m58.813s
sys 0m14.864s
3 Likes

Very interesting. It seems that recent intel processors (U serie) improved a lot compared to my 3 year old intel machine. On M1 Max I obtain a similar results (on battery - no fan spinning - temp 50\degC).

 time julia --startup-file=no --project test.jl
julia --startup-file=no --project test.jl  59,09s user 2,11s system 104% cpu 58,844 total 
1 Like

Cool! I was really curious for an M1 benchmark. That is quick.

1 Like

Compiling Julia from scratch on my M1 takes less than 5 minutes (and the bottleneck is precompilation)

2 Likes

Chris, thanks for this example! Where in the documentation (of some package or of Julia itself) would it be best to have information like this? I think there is a discovery problem here. Your workflow is even better than mine, I wish I had seen it sooner!

After reading this very long unnecessary thread I get an upsetting impression about uneasy Julia language feature. There definitely is a disconnect between core developers and package developers. But it shouldn’t be hard to see that extra tools, additional lines and command line whichcraft are not something that helps package developers to focus on task, instead they’re forced to seek how to solve Julia core related issues such as package load time. Leave alone maintenance time and additional lines of code to think about. This does not sound like a long term solution for the successful language. I am very much Julia enthusiastic but I have lots of friends who struggle to use and switch back to Python for this very real unnecessary whichcraft needed to work with packages. Language should help developers build tools more directly than and aid doing so, instead of being obstacle on its own and asking them to do extra to mitigated JIT behavior.

9 Likes

If you mean building sysimages, then yes - this isn’t really a user-friendly solution: not built-in, supported by a limited set of julia environments, requires many extra actions.
But using them is far from a requirement, I’m sure the vast majority of julia users and developers don’t build custom sysimages. Just write the code following general julia design and performance tips, that’s it! For common workflows with long-running julia sessions (repl or notebooks) this is totally fine.

4 Likes