Recent discussions of blog posts here tended to be of posts critical of Julia. Some of the criticism is fair, some not-so-much. (Btw I’m impressed at how the community looks for constructive feedback to improve).
I thought I’d add a positive post w/ a nice (simple) language comparison.
Let’s see Fortran and C++ gurus post asm-optimal solution in a few days
Interesting post, thank you for sharing. I have been trying to learn Rust and Fortran lately because almost all of my previous coding experiences has been in dynamic languages like Julia and Python and I have picked up some bad habits as a result.
I found it amusing that the key to optimizing the Julia code is the use of the @inbounds
macro, which was at the heart of the criticism made in one of the other blog posts! (But even without this optimization, it is obvious that Julia implementation is quite competitive.)
I wonder how the original Julia version performs if you use the command-line --optimize
option and none of the C++ tricks.
I also wonder how the hand-coded implementation here compares in performance and accuracy to passing the equation to a package like DifferentialEquations.jl that does the same thing.
DifferentialEquations
will be at least 10x faster because Euler’s method is absolutely terrible.
Not if you count the precompilation time
Also, is it theoretically possible be faster than Euler’s method in an order-of-complexity sense? Euler’s method estimates the trajectory at N points in O(N)-time. (Moreover, the overhead is quite small.) Euler’s method is absolutely terrible because it it inaccurate, not because it is slow.
What should be measured is the time span of the trajectory and it’s precision.
But by changing units, we can always assume that one of these is 1, right? So what matters is the product of time span and precision, namely (points per unit time) times (number of units of time) equals (number of points).
Edit: Oh, maybe by “precision” you mean the accuracy of the approximation? So that would be something like O(N / \varepsilon) or whatever, depending on the algorithm.
No, I mean reaching some specific point in the trajectory, for example the inflection point of the population. What should be measured is the computational cost to obtain something like that with a given precision. The Euler method will be very bad at that.
Sigh. I really wonder how well-versed whoever wrote the blog post is in the languages used. The code listed directly below “As the C++ experts couldn’t stand seeing their language loosing, they quickly came op with an optimized C++ implementation” doesn’t even compile as listed, as it uses a return (u)
statement in a function declared with void
return type. Also, they’re using an iterator to go over the std::vector
, while using straight indexing for k = 1:(N-1)
in the Julia version, which could easily be faster. So at least they should try to make the code comparable, as there’s obviously some back-and-forth between the different implementations in order to improve. Plus, sharing the overall benchmark code would be nice.
Can you post a better Julia version here?
Thanks for sharing this. One thing that surprised me when reading the article was how nice the Rust version looked. I’m used to Rust code being ungrokkable because of all of the (seemingly) random characters sprinkled everywhere, but I was pleasantly surprised this time.
And of course, I’m very happy that Julia won in the end
Yes, I can post a better Julia version (but I’m not even trying much for faster), and also fix the bugs in the C++ versions!
If that’s your takeaway, then it problematic is a bit of a half-truth, there’s nothing wrong with using @inbounds
with regular Array
/Vector
(assuming you tested the code, you just need better code and/or more testing for full generality), and the same logic is even the default in C++ and I believe Fortran. It’s just other things you need to be careful about in Julia, i.e. when using OffsetArray
s, with or without @inbounds
, and if C++ had such capability (maybe some obscure library does), then I find it likely you’ll have all the same issues, when such generic templated code would be used. [Also you CAN get the same speed in Julia without @inbounds
, by proofing to the compiler with asserts, that bounds checks not needed.]
If I’m reading the C++ programs correctly, then they index the full array but then with pointer arithmetic, index one element past the end of the array. That’s not checked (would have been with the slower at
, that std:vector supports, likely why it was avoided), and could, and likely will since buggy code with undefined behavior, result in disaster.
[C++ rant: Note, C++ and C (and Fortran too?) do no bounds checking by default (and it’s not in general possible to turn it on, unlike in Julia), for (C-style) arrays, while newer abstractions, e.g.std:vector do allow it, I thought it was the default use, apparently not, and when you have it on, it’s compiled into your program, I doubt the user can turn it off (or on) with a global switch like in Julia, nor am I sure you can disable it locally, except by using completely different syntax.]
Try a quick peek at
std::vector<>
's array operator and it may shed some light on your question, specifically the lack of boundary checking (which memberat()
does do).
See the best practices here: GitHub - JuliaArrays/OffsetArrays.jl: Fortran-like arrays with arbitrary, zero or negative starting indices.
Note, you only need to change the first line, the definition, for correctness regarding unusual array types (and for equvalence with C++, even a bit more general, allowing for e.g. Float32
), but you may want to allow AbstractVector
:
function julia_logistic!(u::Vector{<:Real}, N::Int, dt::Real)
@fastmath @inbounds begin
# Parameters
u0 = convert(typeof(dt), 1e-5) # Just using 1f-5 ok here?
# Right-hand side function
f(U)= U*(one(dt)-U)
# Discretization
u[begin] = u0
for k = eachindex(u)[begin:end-1]
u[k+1] = u[k] + dt*f(u[k])
end
return(u)
end
end
N = 10000;
u = zeros(Float32, N);
julia> @time julia_logistic!(u, N, 1.0f0); # should be, and seems to be almost twice as fast with Float32
0.000048 seconds
If you only called the function with the default Julia provided array structures, then @inbounds
wouldn’t be a bug in the original code. The above blocks it unless you use AbstractArray
in the definition, but then you allow to be called with StaticArray
s too, and they are not mutable, and would fail (not “crash” or worse be memory-corrupting), at runtime. I believe mine will never fail, but I’m not sure how to support additionally only OffsetArray
(it’s not really a requirement to support it, but using Vector disallows some more types, none that I think matter nor a requirement to support).
The first Julia versions wasn’t a mutating versions, it constructed the array for you, since the other versions mutate the array u
, that you take in, the !
at the end is the convention in Julia. It’s a good convention (but only that), telling you it doesn’t make sense to supply the immutable StaticArray
.
Btw, if you use eachindex
, compiler should be able to infer inbounds, I think.
It is slow in convergence which leads to overall deficiency. If Euler is so good, why are all those advanced and accelerating methods out there?
Yes, it should for plain eachindex, but I checked for:
for k = eachindex(u)[begin:end-1]
and that way it’s slower. This is of course also with-in bounds, so a clever enough compiler might discover that. Maybe there’s a different way to write that and/or use assert
s. I just didn’t explore it. I tried changing it to simple eachindex, and then I of course get a bounds error expect if I include @inbounds
, the same silent error as in the C++ code…
I thought this line might do the trick:
for k = eachindex(@view u[begin:end-1])
but then I still see $throw_boundserror
with:
julia> @code_native julia_logistic!(u, N, 1.0f0)
I do see:
movabsq $throw_overflowerr_binaryop, %rax
so the code can likely be optimized further.
The problem is a bit smallish if the manually written Julia implementation takes 2 microseconds.
I hope Yuri’s post doesn’t cause to every new Julia user the impression that there was something terribly wrong about code like the one of this post.
IMO it is perfectly fine for most what is done in situations like this, with or without @inbounds
, and shows the simplicity and expressiveness of the Julia syntax.
Only the @inbounds
should better just be put in the line where it is needed, instead of that ugly begin-end block.
The worst part of the post are the nonsense comments about Fortran.
Just for the record, I cannot reproduce his benchmark, and @inbounds
and @fastmath
are not helping in anything. The faster execution of the “optimized” code is likely because he moved the allocation of the output array outside the function and the execution was so short that that took half of the execution time.
Once they do, could we use Godbolt to translate this to LLVM IR and use Base.llvmcall
to provide a portable LLVM based solution?