John! That is great! Really great to see that this has opened up as an option.
Yes, this is just a āproxy benchmarkā, as @lmiq said. I actually like this particular āproxyā, as I often find having a loop and calling a subroutine inside the loop (some kind of a numerical function) to apply to each element of an array (and the result is an array) or to apply to each element and take a sum (as in the above case). And so to be able to optimize these kind of benchmarks to obtain optimal performance is essential. Obviously, this is not the only thing we want to get optimized, but itās a start.
If anyone of you are interested, we have started a ābenchmarkā repository at:
GitHub - fortran-lang/benchmarks: Fortran benchmarks
But didnāt have time to actually add any. The above benchmark would be a great candidate. You can browse through the issues there, such as this one for some background discussion how to approach this:
Benchmark criteria Ā· Issue #2 Ā· fortran-lang/benchmarks Ā· GitHub
Julia has its own benchmarks here: GitHub - JuliaLang/Microbenchmarks: Microbenchmarks comparing the Julia Programming language with other languages. Last time I suggested to use -ffast-math
to actually speedup one Fortran benchmark there by a factor of 2, it was not approved by the Julia community. My own conclusion from that is that I think our communities will not agree on how to run the benchmarks, what compiler options to use and how to interpret the results. However, I think we might agree on some (subset) of the available benchmarks and how to improve them. So we could collaborate on that.
In my relatively short experience here it is virtually impossible to āagreeā on these benchmarks. Both languages can write code equally performant, and āfairnessā is in the eyes of the beholder. I had before written Julia code faster than my own Fortran code and it was shown to me here (actually C. Elrod did) how I should be executing the Fortran code to be equally performant. So I became a better Fortran userā¦
At the end it comes down to what is more natural to write in one or other language and how likely someone will find the optimal code. But that is not a quantitative variable.
I suppose that if you are comparing
a loop of sequentially computing a modest function call (sine of an integer) and
trying to compute the same functions in some maximally parallel fashion,
and tying this together by summing them ā¦
you could write it as map-reduce. Conceptually In common lisp (reduce #'+ (mapcar #'sin ā (1 2 3 ā¦)))
but you wouldnāt actually program it that way since it starts by construction of the list ā¦
see
so comparing map-reduce āin parallelā somehow vs map-reduce āusing one processorā is the benchmark. Is there a Julia MapReduceInParallel ??
Maybe GitHub - tkf/ThreadsX.jl: Parallelized Base functions qualifies?
In Julia there is no reason not to write the loop explicitly.
@rfateman here are some options to write one-liners, compared to the above. Hard to beat the loop with avx:
julia -t4 sum.jl
loop 293.541 ms (0 allocations: 0 bytes)
1.9558914085412433
avx 22.379 ms (0 allocations: 0 bytes)
1.955891408541291
avxt 8.393 ms (0 allocations: 0 bytes)
1.955891408541158
simd 292.663 ms (0 allocations: 0 bytes)
1.9558914085412433
sumiter 294.163 ms (0 allocations: 0 bytes)
1.9558914085412433
mapreduce 298.790 ms (0 allocations: 0 bytes)
1.9558914085409373
threadsx.mapreduce 109.673 ms (255 allocations: 15.66 KiB)
1.9558914085412005
Code
using BenchmarkTools, Test
using LoopVectorization
using ThreadsX
function f(N)
s = 0.
for i in 1:N
s += sin(i)
end
s
end
function f_avx(N)
s = 0.
@avx for i in 1:N
s += sin(i)
end
s
end
function f_avxt(N)
s = 0.
@avxt for i in 1:N
s += sin(i)
end
s
end
function f_simd(N)
s = 0.
@simd for i in 1:N
s += sin(i)
end
s
end
f_sumiter(N) = sum(sin(i) for i in 1:N)
f_mapreduce(N) = mapreduce(sin, +, 1:N)
f_threadx_mapreduce(N) = ThreadsX.mapreduce(sin, +, 1:N)
N = 10000000
@test f(N) ā f_avx(N) ā f_avxt(N) ā f_simd(N) ā f_sumiter(N) ā f_mapreduce(N) ā f_threadx_mapreduce(N)
print("loop");println(@btime f($N))
print("avx");println(@btime f_avx($N))
print("avxt");println(@btime f_avxt($N))
print("simd");println(@btime f_simd($N))
print("sumiter");println(@btime f_sumiter($N))
print("mapreduce");println(@btime f_mapreduce($N))
print("threadsx.mapreduce");println(@btime f_threadx_mapreduce($N))
I agree that benchmarks lead to disputes, but can also be educational. That is, you work on a benchmark with a pre-determination to find one that shows your latest project is better than the competition! Or you use it to improve your project! To do this you write what you think are equivalent codes using someone elseās system. Typically, an expert in that other system can find a better way to write the benchmark and so the race is on. But you learned a bit about the other system.
With this in mind, hereās another version of the benchmark written in lispā¦
(loop for i from 1 to 100 sum (sin i))
;;; now you might say, huh, Lisp has keywords, a do loop, ???
well, loop is defined as a macro. Thereās another ādoā version that has no keywords, but more parentheses if you preferā¦
Now the rest of the benchmark requires a study as to how to compile this, and, if I understand what you are really trying to compare, can you indicate a parallel version that will compile differently, say
(loop for i from 1 to 10 sumparallel (sin i))
The coders of the compiler for common lisp have at their disposal whatever tools are available to any other compiler writers. Some common lisp compilers target C, some target JVM, some have their own byte-code, some generate i86 assembler, ARM, M1, etc, I suppose one could target Julia. Common lisp can also run MPI, for example
https://common-lisp.net/project/cl-mpi/
I point this out not because I ran it and it is faster than FORTRAN or Julia.
I only say this to point out that the expression is about as clear as can be, even though it is in lisp. Could it be made clearer? um, maybe
(loop for i from 1.0d0 to 100.0d0 sum (sin i)) would ease the burden on the compiler type-inference task, but it might generate the same code. Certainly the result type of (sin i) is a float.
Also, at least some common lisps (SBCL, a popular one) does just-in-time compilation.
Hereās another upvote, and let that include not only executables but libraries, too.
My 2 cents: At work I am currently familiarising myself with a Fortran PDE-solving code (about 10 years old). As there are not many proficient developers around, we are discussing rewriting/replacing the codebase. This code (among others) is being used in dll form by a C# GUI tool. Therefore, afaict Julia is out of the race right out the door - no way to compile a dll (and also no proficient devs yet).
It looks like the problem could otherwise be a good fit for Julia, but right now it looks like a mixture of using python to wrap/write tests/probe (to ease interaction, many know python) and doing as many improvements/ modernizations as feasible (so Iām picking up Fortran now, and that makes sense for me although itās āoldā).
Personally, this is fine for me - my toolkit will include Python, Modelica, Fortran (maybe a little C#) and should cover most problems Iāll encounter. One day Julia will hopefully be at a point where it can replace one or more of these for me, or at least join the club beyond hobby/spare time, but right now thatās not in the cards.
I think Julia would make a fine outer layer (wrapper) of such a code as you described.
So perhaps another viewpoint might be helpful: instead of thinking of Julia as something
that needs to be wrapped (as a dynamic lib, perhaps), think of Julia as the glue?
The cool alternative would be to replace everything with Julia. Not that I claim that to be realistic for his application specifically at this point, but what is there (and I have some case like this) involves: compiling for different platforms the Fortran code, ship or make the user install python, have run the GUI only on Windows (C#).
Alternatively one could simply download a very simple installer that ships Julia and the application, or if the user already has Julia, just an add
.
I guess (not sure) that the GUI is where things are less mature, am I right?
(Also if the application relies heavily in the user actually using python, that is another problem)
Petr: Sure! One draw of Julia is that it has the potential to be both the (nice) glue and the (fast) gears.
In this case, any glue we need will be Python, as we have >5 proficient people, and 0 for Julia (I only know a little).
Leandro: indeed, that would solve the N-language problem, right there.
I guess what I tried to achieve here is illustrate how the lack of dll capability can be an adoption blocker in some cases.
(For completeness: maybe i did not express this well, but python is only for developer use, the users use the solver dll via a C# GUI (that is maintained elsewhere). Replacing the small dll with a couple hundred MB of interpreter and code is not realistic I think.
The manual describes how to link an application against libjulia.so on Linux; does that not work on Windows for dlls?
A ācouple of hundred MBā dll is not dramatic compared to the size of the C# runtimeā¦ If you are on a desktop machine, does the dll size really matter?
Thanks for the pointer, Iāll have to take a look at that.
Will it not bi sitting in RAM? Will it not be contributing to RAM fragmentation?
Ofc it matters.
If itās one blob, it wonāt cause any fragmentation. Also, if large portions of it arenāt needed, they will likely be moved to swap by the OS. The binary size isnāt a good thing (hence StaticCompiler.jl
), but imo, itās a pretty minor concern.
I didnāt mean to say that the blob would fragment but that dll will have to play arround with all others loaded in memory. Now, I donāt know about what parts of a dll get loaded into RAM but I know from my system (32 Gb RAM) that Task manager tells me that I have 60% of that memory free but another tool tells me that the largest chunk is only 2 GB. Than mean I currently cannot load an image larger than that size.
...
C:\SVN\mirone30\bin\win32\m_dispatcher.dll 7afe0000 00006000 0000a000
C:\SVN\mirone30\bin\win32\libmwbuiltins.dll 7aff0000 00027000 000f9000*
C:\SVN\mirone30\bin\win32\libmx.dll 7b110000 00028000 00008000
C:\WINDOWS\SYSTEM32\MFC42.DLL 7b140000 00124000 04d7c000*
<anonymous> 7ffe0000 00001000 00007000
<anonymous> 7ffe8000 00001000 7fd57000*
<anonymous> ffd40000 00060000 00000000
<anonymous> ffda0000 00100000 00000000
<anonymous> ffea0000 00011000 0000f000
<anonymous> ffec0000 00002000 0000e000
<anonymous> ffed0000 00021000 0000f000
<anonymous> fff00000 00002000 0000e000
<anonymous> fff10000 00001000 0000f000
<anonymous> fff20000 00023000 800bd000
========== ==========
Totals 0a99d000 75663000
Largest available memory block is 2144694272 bytes (2045.34 MB) located at address 7ffe9000
At the risk of careening off topic, I think that youāre confusing some things about physical and virtual memory. Iām not on Windows, but I have a vague idea how it works. Outside of some specialized uses, physical memory does not get fragmented (or alternatively, it is completely fragmented); it is assigned on a page-by-page basis to virtual memory in the running processes. A page on Windows is 4kb. Furthermore, the code and data of programs and dlls are only assigned to physical memory on demand as they are needed, even if a large chunk of virtual memory is reserved for them when loaded.
I donāt know what the list is that youāve posted; it seems to show mappings that have already been made and freed. But the available address space on 64 bit machines is enormous, 2^48. I donāt you need to worry about loading images >2GB in either physical or virtual memory.
This is just to say that a dll that takes up 200MB of disk space or maps 2GB of memory is just not a big deal.
Sure I may be confusing things. I donāt know much on the details on how it works. The numbers I showed are from a MEX file provided by a Mathworks worker (no source code) that show the RAM fragmentation and me experience on using it is that I could never load data to RAM that was larger then the largest chunk reported by that tool. Matlab has now a builtin function that gives this information but I forgot its name.
Julia: letās keep the advantages of Python being interactive, easy to learn, āduck typingā, etc. Now letās see how we can fix the main flaw: speed. To do that, one has to introduce types, use LLVM from the beginning, figure out the semantics in a way so that it can be compiled to be fast, etc. Modify the language a bit. Since we are introducing a new language, letās also do some other cool stuff, such as multiple dispatch.
Julia: Let us make a programming language for mathematicians and scientists! Lets avoid the licensing problems of Matlab, the archetypal scientific programming package that is not quite a language. Keep the interactive aspect and as easy to learn as the user desires. Let us make it fast by making it compiled and use LLVM for portability. Let us not make it cryptic and over-vectorized in syntax, except when it remains readable to do so. Greenspunās 10th rule says all good languages become a subset of Common LISP, so letās actually build it from LISP, but hide that from the user (but not really). Lastly, let us build a story around how multiple dispatch is the secret to success, but itās really libraries and package management that will create a movement.
\textrm{Julia}\approx (((\textrm{Matlab} \cup \textrm{Python}) \setminus \textrm{design_mistakes}) \cup \textrm{multiple_dispatch}) \triangleright(\textrm{Lisp} \setminus \textrm{parentheses})