The beauty of f(x, xs...)
is in not needing special syntax for this.
When it comes to syntax, less is more.
The beauty of f(x, xs...)
is in not needing special syntax for this.
When it comes to syntax, less is more.
Itās probably more of an issue when you have a substantial type signature, like:
f(x::SomeType{A}, xs::SomeType{A}...)
Then maybe this is better
f(x::T, xs::T...) where {T<:SomeType{A}}
Hi, I have two questions.
In this case
Here and example
using BenchmarkTools
fun1(x, xs...) = (x, xs...)
fun2(xs...) = xs
xs = rand(Int(1e3))
x0, x1 = xs[1], xs[2:end]
@btime fun1($x0, $x1...)
# 70.376 Ī¼s (2008 allocations: 78.72 KiB)
@btime fun2($xs...);
# 35.817 Ī¼s (1003 allocations: 39.38 KiB)
I would also add that I cannot think of a case where I have to write this and then put the first element back into a tuple with the rest. You almost always have to recurse so that having the first element is useful or you should be defining the single element method explicitly.
As an argument, plain and simple. Splatting it is not idiomatic and incurs a significant compilation and performance penalty.
For
fn(x, xs...) = (x, xs...)
you donāt need to split a collection into the first element and the rest, Julia does that automatically.
In your example, calling fun1(xs...)
would work just fine (except for extra allocations).
Got it!
Nice, I didnāt know
Thanks
Exactly this situation came up to make small inert polynomials by summing terms. The intention is that Terms
get collected now, to be iterated later, and the consumer might choose in which order to deal with them.
struct Term
coefficient::Int64
degree::Int64
end
struct PolyTerm{N}
pt::NTuple{N,Term}
end
import Base: +
+(s::Term, t::Term...) = PolyTerm((s,t...))
I really want to write +(pt::Term...)=PolyTerm(pt)
, to avoid splitting and repacking my Tuple, but then Iāve accidentally gone and pirated +()
to create a PolyTerm{0}
instead of the MethodError
people may have been counting on.
Why? I did not check, but ideally it should compile to the same code.
They donāt, at least not at the @code_llvm
level (is there another optimization step after that? either way, thereās either a difference in run-time for compiled code or compile time between the two versions). Edited to add a MWE
struct Goose i::Int64 ; j::Int64 end
struct Flock{N} g::NTuple{N,Goose} end
Waddle(g::Goose...) = Flock(g)
Swim(g::Goose,moregeese::Goose...) = Flock((g,moregeese...))
@code_llvm Waddle(Goose(1,2), Goose(2,3), Goose(3,4))
@code_llvm Swim(Goose(1,2), Goose(2,3), Goose(3,4))
gives (sorry, not sure how to nicely format output)
; @ REPL[3]:1 within `Waddle'
define void @julia_Waddle_170([1 x [3 x [2 x i64]]]* noalias nocapture sret, [2 x i64]* nocapture nonnull readonly dereferenceable(16), [2 x i64]* nocapture nonnull readonly dereferenceable(16), [2 x i64]* nocapture nonnull readonly dereferenceable(16)) {
top:
%4 = alloca [3 x [2 x i64]], align 8
%5 = bitcast [3 x [2 x i64]]* %4 to i8*
%6 = bitcast [2 x i64]* %1 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %5, i8* nonnull align 1 %6, i64 16, i1 false)
%7 = getelementptr inbounds [3 x [2 x i64]], [3 x [2 x i64]]* %4, i64 0, i64 1
%8 = bitcast [2 x i64]* %7 to i8*
%9 = bitcast [2 x i64]* %2 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %8, i8* nonnull align 1 %9, i64 16, i1 false)
%10 = getelementptr inbounds [3 x [2 x i64]], [3 x [2 x i64]]* %4, i64 0, i64 2
%11 = bitcast [2 x i64]* %10 to i8*
%12 = bitcast [2 x i64]* %3 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %11, i8* nonnull align 1 %12, i64 16, i1 false)
%13 = bitcast [1 x [3 x [2 x i64]]]* %0 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %13, i8* nonnull align 8 %5, i64 48, i1 false)
ret void
}
vs
; @ REPL[4]:1 within `Swim'
define void @julia_Swim_171([1 x [3 x [2 x i64]]]* noalias nocapture sret, [2 x i64]* nocapture nonnull readonly dereferenceable(16), [2 x i64]* nocapture nonnull readonly dereferenceable(16), [2 x i64]* nocapture nonnull readonly dereferenceable(16)) {
top:
%4 = alloca [2 x [2 x i64]], align 8
%5 = alloca [3 x [2 x i64]], align 8
%moregeese = alloca [2 x [2 x i64]], align 8
%6 = bitcast [2 x [2 x i64]]* %4 to i8*
%7 = bitcast [2 x i64]* %2 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %6, i8* nonnull align 1 %7, i64 16, i1 false)
%8 = getelementptr inbounds [2 x [2 x i64]], [2 x [2 x i64]]* %4, i64 0, i64 1
%9 = bitcast [2 x i64]* %8 to i8*
%10 = bitcast [2 x i64]* %3 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %9, i8* nonnull align 1 %10, i64 16, i1 false)
%11 = bitcast [2 x [2 x i64]]* %moregeese to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %11, i8* nonnull align 8 %6, i64 32, i1 false)
%12 = getelementptr inbounds [2 x [2 x i64]], [2 x [2 x i64]]* %moregeese, i64 0, i64 1
%13 = bitcast [3 x [2 x i64]]* %5 to i8*
%14 = bitcast [2 x i64]* %1 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %13, i8* nonnull align 1 %14, i64 16, i1 false)
%15 = getelementptr inbounds [3 x [2 x i64]], [3 x [2 x i64]]* %5, i64 0, i64 1
%16 = bitcast [2 x i64]* %15 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %16, i8* nonnull align 8 %6, i64 16, i1 false)
%17 = getelementptr inbounds [3 x [2 x i64]], [3 x [2 x i64]]* %5, i64 0, i64 2
%18 = bitcast [2 x i64]* %17 to i8*
%19 = bitcast [2 x i64]* %12 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %18, i8* nonnull align 8 %19, i64 16, i1 false)
%20 = bitcast [1 x [3 x [2 x i64]]]* %0 to i8*
; ā @ REPL[2]:1 within `Flock' @ REPL[2]:1
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %20, i8* nonnull align 8 %13, i64 48, i1 false)
; ā
ret void
}
The distinction disappears if Goose
has a single field.
Thanks, this is interesting. @btime
reports no performance difference, so I am not sure if this is an issue that needs to (can be) addressed.
My point was that in theory, I donāt see a compelling reason why the two forms should be effectively different.
Thatās a good example. As you observe, it only needs the āat least oneā property in order to avoid piracy and it happens to want to create a tuple internally, so the āat least one splatā would be really convenient here. If you were defining a function that you āownedā then the zero args method would be fine, e.g.:
PolyTerm(args::Term...) = PolyTerm{length(args)}(args)
julia> PolyTerm()
PolyTerm{0}(())
julia> PolyTerm(Term(2,3))
PolyTerm{1}((Term(2, 3),))
Seems like a good spot to mention an issue I was looking into.
In some situations, a splatted/slurped tuple leads to extra allocations over a tuple passed directly, when the tuple is used within an unoptimized throw block.
Itās a bit maddening because I canāt see any obvious place in the LLVM where the additional allocation is occurring. And yet, all I need to do to eliminate the extra allocation is either 1) pass a complete tuple directly to the function or 2) ensure the slurped tuple is not used within the unoptimized block.
So maybe itās rare, but sometimes they do compile differently.