I am relatively new to Julia. I am mainly interested in Julia because of the performance. There are some functions that are built in Julia like sum(). A custom sum function is worse in performance than the built-in sum function. How come those built-in functions run faster than the custom ones? How can we make our custom functions run as fast as the built-in functions in Julia?
well, apology for not using technical language but for example if one wants to know how Julia implement add, it âdoesnâtâ: + of integers is Core.Intrinsics.add_int and that:
julia> @which Core.Intrinsics.add_int(1,2)
ERROR: ArgumentError: argument is not a generic function
Stacktrace:
[1] which(::Any, ::Any) at ./reflection.jl:1142
[2] top-level scope at REPL[21]:100
so FWIW, sum and certain functions are not âlearnableâ unlike sortperm where one can digest parts of the algorithms in a meaningful way.
of course even self-host languages can use things like BLAS and so on, but it can potentially have more âdigestibleâ functions implemented in the language itself was my thought.
Yes, itâs fair to say that certain + methods are not implemented in julia. However,
It has nothing to do with sum
It doesnât stop you from implementing an identical one, i.e. you can learn it.
It has nothing to do with self-hosting. For example, + is also not implemented in C or C++ (in a way that the user can âlearnâ and reimplemennt), and I donât see people saying C or C++ not being self hosting.
sum is exactly like sortperm.
No. Youâll have a hard time finding a language, self-hosted or not, that implements more features in the language itself than julia. Self-hosting mostly talk about the compiler/interpreter/runtime for the language itself. Most of what julia use other languages for are not really implementing âfunctionsâ in the sense that sum is a function. Thereâs almost nothing that is of general interests (rather than implemetation details) thatâs implemented in other languages in julia.
@andreas.karatzas In many high-level languages there are built-in functions that are faster than those you can write in the language itself.
This isnât really the case in Julia. The sum function isnât special, and you can write it equally fast yourself, mainly you just need to learn how to write simd-friendly code, and use the performance tips.
There are several example posts floating around here somewhere, that show how to make a fast version of sum, even faster than the âbuilt-inâ one, which is a bit slower than optimal to achieve better accuracy.
The following custom implementation for sum is actually faster than the built-in one (but as @DNF stated less precise):
function mysum(x)
s =zero(eltype(x))
@inbounds @simd for i in x
s += i
end
s
end
using BenchmarkTools
A = rand(10000)
@btime sum($A) # 6 microseconds
@btime mysum($A) # 4.8 microseconds
With all the dimension keyword and higher-order reduction machinery around it these days, itâs actually quite hard to find the meat of the sum implementation, and when you find it, itâs kind of hard to figure out whatâs going on. But thatâs just because the same code is used for all reductions (reduce, sum, prod, etc.). Itâs not because thereâs anything going on that the user canât do themselvesâthereâs nothing special or built-in about it. The straightforward mysum implementation that @lungben wrote is super simple and even faster than the built-in sum, although itâs less accurate since it doesnât use pairwise summation.
Regarding the add_int intrinsic: every programming language at some point needs to step out of itself and explain how actual work gets done. Otherwise youâd have a turtles all the way down situation: if everything is generic functions calling other generic functions, how do you get to machine code? At some point you have to stop stacking turtles and actually do a hardware add operation. The add_int intrinsic is how you can tell Julia, âstop calling generic functions and just do a hardware add here.â
The sortperm function is no different than sum in this regard: it doesnât use addition, but if youâre sorting a floating-point array, it will use Core.Intrinsics.lt_float to compare them and Core.arrayref and Core.arrayset to read and write array values. The latter two are built-in non-generic functions rather than intrinsics. Once upon a time, intrinsics were something that only code generation knew how to emit and didnât have a runtime C implementation, whereas built-in functions only had a runtime callable C implementation and code gen didnât know how to emit them. These days we have an interpreter which has runtime implementations of all intrinsics and the compiler now knows how to generate code for most (all?) built-ins. So that distinction is no longer really meaningful and only persists because itâs not really worth the time and effort to change it. Historically, however, thatâs why there are both intrinsics and built-ins.
Users donât generally have to call or know about built-ins or intrinsics (although they can) because all the operations they implement are available via generic functions like + that are exported from Base and which, unlike built-in functions or intrinsics, you can extend to your own types. From the normal Julia userâs perspective, itâs all just generic functions. But any programming language that does actual work needs to have some kind of primitive exit point where it can invoke machine code.
See lectures 1 and 2 and the accompanying video from our short course, we we talk specifically about performance of the sum function in various languages, and show how to make a few-line implementation that roughly matches the performance of the standard-library sum function (which is also written in Julia, but is more complicated â it uses pairwise summation for improved accuracy, and is generalized to merge it with the generic mapreduce code).
The custom sum function is implemented at Julia Academy in a course called âIntroduction to Juliaâ (which I loved by the way). I have not gone into much detail there. I was just amused to find out that there was a slight difference regarding performance between the custom sum function and the Julia sum function.
I am truly thankful to the lot of you. Each gave me an answer that helped me understand a different bit of how Julia works. I also wanna say that since Julia is great in performance, itâs only fitting that its community is as fast.
I donât know which course it is since you didnât link it, but in general note that while all the machinery that Base & the standard libraries use to speed things up is available to the user, in a lot of cases it would just complicate the picture in a pedagogical context, especially in an intro course.
So versions of sum etc that are implemented as examples may not be as fast or general as the versions you find in Base, just like your first acoustic guitar lesson may not involve tapping, even though your guitar is fully capable of it.
(In the case of sum, all you need is @simd to match the Base.sum performance. But even the @simd decoration is a needless complication in an introductory course.)