Built-in vs. user-defined functions

Hello,

I am relatively new to Julia. I am mainly interested in Julia because of the performance. There are some functions that are built in Julia like sum(). A custom sum function is worse in performance than the built-in sum function. How come those built-in functions run faster than the custom ones? How can we make our custom functions run as fast as the built-in functions in Julia?

sum is a bit special as it goes down to some llvm, C - ish things as you can imagine, since Julia is not a self-host language.

But in general, you can use @which and @edit to find how Julia’s stdlib does it, for example:

julia> @which sortperm(rand(3))
sortperm(v::AbstractArray{T,1} where T; alg, lt, by, rev, order) in Base.Sort at sort.jl:904

and if you open up that line (or do @edit instead), you will see how they are implemented in Julia:
https://github.com/JuliaLang/julia/blob/971e769479a2947a55cc253dc9750fd2a361367a/base/sort.jl#L915

2 Likes

No it’s not built in.

And no it’s not, since sum is written in julia.

Read the base code and see how they are implemented in julia. See the second half of @jling 's answer for that.

Errr, not really.

And this is just completely irrelevant. Even a “self host” language does not mean it use nothing else.

10 Likes

well, apology for not using technical language but for example if one wants to know how Julia implement add, it ‘doesn’t’:
+ of integers is Core.Intrinsics.add_int and that:

julia> @which Core.Intrinsics.add_int(1,2)
ERROR: ArgumentError: argument is not a generic function
Stacktrace:
 [1] which(::Any, ::Any) at ./reflection.jl:1142
 [2] top-level scope at REPL[21]:100 

so FWIW, sum and certain functions are not ‘learnable’ unlike sortperm where one can digest parts of the algorithms in a meaningful way.

of course even self-host languages can use things like BLAS and so on, but it can potentially have more ‘digestible’ functions implemented in the language itself was my thought.

1 Like

Yes, it’s fair to say that certain + methods are not implemented in julia. However,

  1. It has nothing to do with sum
  2. It doesn’t stop you from implementing an identical one, i.e. you can learn it.
  3. It has nothing to do with self-hosting. For example, + is also not implemented in C or C++ (in a way that the user can “learn” and reimplemennt), and I don’t see people saying C or C++ not being self hosting.

sum is exactly like sortperm.

No. You’ll have a hard time finding a language, self-hosted or not, that implements more features in the language itself than julia. Self-hosting mostly talk about the compiler/interpreter/runtime for the language itself. Most of what julia use other languages for are not really implementing “functions” in the sense that sum is a function. There’s almost nothing that is of general interests (rather than implemetation details) that’s implemented in other languages in julia.

12 Likes

@andreas.karatzas In many high-level languages there are built-in functions that are faster than those you can write in the language itself.

This isn’t really the case in Julia. The sum function isn’t special, and you can write it equally fast yourself, mainly you just need to learn how to write simd-friendly code, and use the performance tips.

There are several example posts floating around here somewhere, that show how to make a fast version of sum, even faster than the ‘built-in’ one, which is a bit slower than optimal to achieve better accuracy.

5 Likes

The following custom implementation for sum is actually faster than the built-in one (but as @DNF stated less precise):

function mysum(x)
    s =zero(eltype(x))
    @inbounds @simd for i in x
        s += i
    end
    s
end

using BenchmarkTools
A = rand(10000)
@btime sum($A) # 6 microseconds
@btime mysum($A) # 4.8 microseconds
4 Likes

Can you give an example of your custom written function that’s slow? And have you read the existing Julia implementation of sum?

2 Likes

With all the dimension keyword and higher-order reduction machinery around it these days, it’s actually quite hard to find the meat of the sum implementation, and when you find it, it’s kind of hard to figure out what’s going on. But that’s just because the same code is used for all reductions (reduce, sum, prod, etc.). It’s not because there’s anything going on that the user can’t do themselves—there’s nothing special or built-in about it. The straightforward mysum implementation that @lungben wrote is super simple and even faster than the built-in sum, although it’s less accurate since it doesn’t use pairwise summation.

Regarding the add_int intrinsic: every programming language at some point needs to step out of itself and explain how actual work gets done. Otherwise you’d have a turtles all the way down situation: if everything is generic functions calling other generic functions, how do you get to machine code? At some point you have to stop stacking turtles and actually do a hardware add operation. The add_int intrinsic is how you can tell Julia, “stop calling generic functions and just do a hardware add here.”

The sortperm function is no different than sum in this regard: it doesn’t use addition, but if you’re sorting a floating-point array, it will use Core.Intrinsics.lt_float to compare them and Core.arrayref and Core.arrayset to read and write array values. The latter two are built-in non-generic functions rather than intrinsics. Once upon a time, intrinsics were something that only code generation knew how to emit and didn’t have a runtime C implementation, whereas built-in functions only had a runtime callable C implementation and code gen didn’t know how to emit them. These days we have an interpreter which has runtime implementations of all intrinsics and the compiler now knows how to generate code for most (all?) built-ins. So that distinction is no longer really meaningful and only persists because it’s not really worth the time and effort to change it. Historically, however, that’s why there are both intrinsics and built-ins.

Users don’t generally have to call or know about built-ins or intrinsics (although they can) because all the operations they implement are available via generic functions like + that are exported from Base and which, unlike built-in functions or intrinsics, you can extend to your own types. From the normal Julia user’s perspective, it’s all just generic functions. But any programming language that does actual work needs to have some kind of primitive exit point where it can invoke machine code.

23 Likes

See lectures 1 and 2 and the accompanying video from our short course, we we talk specifically about performance of the sum function in various languages, and show how to make a few-line implementation that roughly matches the performance of the standard-library sum function (which is also written in Julia, but is more complicated — it uses pairwise summation for improved accuracy, and is generalized to merge it with the generic mapreduce code).

7 Likes

The custom sum function is implemented at Julia Academy in a course called “Introduction to Julia” (which I loved by the way). I have not gone into much detail there. I was just amused to find out that there was a slight difference regarding performance between the custom sum function and the Julia sum function.

Hello again,

I am truly thankful to the lot of you. Each gave me an answer that helped me understand a different bit of how Julia works. I also wanna say that since Julia is great in performance, it’s only fitting that its community is as fast.

Thank you guys!

10 Likes

I don’t know which course it is since you didn’t link it, but in general note that while all the machinery that Base & the standard libraries use to speed things up is available to the user, in a lot of cases it would just complicate the picture in a pedagogical context, especially in an intro course.

So versions of sum etc that are implemented as examples may not be as fast or general as the versions you find in Base, just like your first acoustic guitar lesson may not involve tapping, even though your guitar is fully capable of it.

1 Like

(In the case of sum, all you need is @simd to match the Base.sum performance. But even the @simd decoration is a needless complication in an introductory course.)

6 Likes