Why do tuples exist in Julia?
Why do we need them if we already have vectors (arrays)?
They are fixed-length and cannot be modified. Is it advantageous?
Maybe they are useful for type inference. If you place objects of different types into an array, it will become Array{Any}
, but not so for tuples:
julia> typeof(("a", 1, 1.0))
Tuple{String,Int64,Float64}
julia> typeof(["a", 1, 1.0])
Array{Any,1}
Of many, here are three.
Julia has functions which take arguments and the arguments to a function are tupled. Without tuples, n-ary functions become clumsy.
The fixed-length-ed-ness of a tuple is a powerful tool for performance. Knowing that elements of a given type (or elements each of a type explicitly given in sequence) are in some specific location and that their values will not change (tuples are immutable entities) opens up performance acceleration at compile time.
Read up on NamedTuples. Nothing says “hey, use tuples” quite like
thread = (person = Juan, question = "When to use tuples?")
Indeed using a tuple hints the compiler that it should pay attention to the types of the individual elements in the tuple and how many of them there are. And immutability is also beneficial. An array has a single common element type and has variable size so the compiler won’t try to reason about the number or type of individual elements in an array.
Aside from what others have said, tuples are also different from arrays in terms of their intended use. Arrays are typically for storing an arbitrary number of homogeneous items. Tuples are more like rows in a database, or like a poorman’s struct. It’s a fixed number of (potentially) heterogenous data types. Tuples are the simplest way to do data composition in Julia (though structs are going to scale more nicely in a large codebase).
Are they like lists in other languages?
Depends what that language means by “list” but probably not.
Fortunately, now NamedTuple
s are taking over both of these use cases. Tuple
s may end up as an important building block that only intermediate Julia programmers need to use regularly.
Can’t lie: I’ll take a struct over a named tuple nine times out of ten because I can define methods on it without fear of collisions. Tuples and named tuples are more like syntactic sugar for your API surface; i.e. users should never have to normally shouldn’t have to instantiate my stupid structs, but, eh, making them write tuples, that’s fine.
I think you misunderstand: NamedTuple
s were not intended to replace composite types (though you can dispatch on them, this is technically type piracy if you don’t own the method, and in any case cumbersome because permuting the fields leads to a different type).
Many APIs make/allow users instantiate struct
s, eg Optim.jl. This is good practice.
Yeah, I would never dispatch on a tuple type outside of my own module. What I meant was that I’d rather have an interface like:
MyLib.do_the_thing(a, b, c)
rather than…
instance = MyLib.MyStruct(a, b)
MyLib.do_the_thing(instance, c)
Allowing users to create structs is fine if they need more control, but object setup and teardown is like Java busywork. My preference is to deal with the library I’m using as little as possible. If there’s one or two functions that cover the majority of usecases for my module, all you should have to do is stick some built-in types in those functions and get the output you want.
This is what I meant to say about using tuples for API stuff; Not that users should never have to use my structs directly—just that simple things should be simple. I’m sure you didn’t mean to suggest anything to the contrary, of course.
I am not sure I understand — how do you know where a tuple came from?
Again, I am confused. Immutable types have zero or near-zero overhead in most cases.
And what is “setup” and “teardown” in the context of Julia, and why would you worry about it?
I should have said I would never add a method that dispatches on a tuple type to a function that wasn’t in my module. I don’t care where the tuple comes from.
So, there are a couple of ways to iterate over lines in a file in Julia:
io = open("filename")
for line in eachline(io)
dosomething(line)
end
close(io)
I’d call opening and closing the file setup and teardown. You’re creating an object to do something with it and then your freeing the file descriptor afterwards. Most Julia objects will be garbage collected, so teardown isn’t needed for all of them, but there are still cases where you might have to build the object manually. In the case of files, Julia has a nice way to encapsulate the teardown step with a block function.
open("filename") do io
for line in eachline(io)
dosomething(line)
end
end
This is better, but I still need to create a filehandle. That’s busywork. Furthermore, even thought I don’t have to write close(io)
, I still have to care about it. That’s why I’m using a block. Julia gives you something better.
for line in eachline("filename")
dosomething(line)
end
These three things are semantically equivalent do the job,* but the last one means I don’t have to care about file handles and freeing file descriptors and all that crap. I want to do one thing: iterate on lines. I call one function with a regular string as an argument and it gives the lines I want.
There are times when I do want the filehandle, but the Julia standard library understands there are a lot of file operations where the handle itself is uninteresting to the programmer, so it provides methods that create and destroy it implicitly and lets me focus on the business logic.
Programmers can create my structs if that’s useful for them (like if I have a fancy collection type that lets that do amazing and wonderful things with data), but if all they want from me is a verb, I don’t want to hand them a noun.
Say I have some kind of geometry or plotting library. I probably have struct Point x::Int; y::Int end
in there somewhere so I can have internal methods that dispatch on this type, but I’m not going to force my users to send me Point instances. Some methods will accept Tuple{Int,Int}
instead and turn them into points behind the scenes (or better, just have x and y as distinct function parameters).
* Edit: They aren’t semantically equivalent. The second one will clean up the file descriptor even if the loop is broken before the end of the file is reached. The other two don’t do that. Something to keep in mind if you’re scanning a file until you find the info you’re looking for and breaking.
Sorry, I don’t see what this has to do with tuples.
Also tuples are the only covariant datatype in Julia, all others are invariant. (This is e.g. needed for them to work as function arguments):
julia> (1,) isa Tuple{Integer}
true
julia> [1,] isa Vector{Integer}
false
It has to do with APIs that accept built-in types as arguments and don’t require users to deal with constructors, which is a place where you might want to use a tuple, which is what I’ve been trying to say all along.
Built-in types are not that special in Julia, and there is no reason to prefer them. In fact, clever punning and implied mappings to and from built-in types can look very attractive initially but can easily lead to confusing interfaces. Also, with multiple dispatch, method ambiguities.
An Array
has at least 32 bytes, one allocation and 2 pointer derefs of overhead, for homogeneous bitstypes; and for each bitstype stored in an inhomogeneous array, you get an extra indirection plus allocation plus 16(?) bytes of overhead, excepting small bitsunions. A small tuple of inferred bitstypes has zero overhead. For a small compile-time known number of elements, don’t use Array
. Avoid storing bitstypes in inhomogeneous arrays, excepting things like Vector{Union{Float64, Nothing}}
.
The alternative to tuples is not arrays, it is struct
. Which one is best depends on what kind of dispatch and covariance you need.
Very often, dispatch and covariance is irrelevant: You know types at compile time, and don’t need to explicitly call any function that has more than a single method. Then, you can use (homogeneous or inhomogeneous) Tuple
and NamedTuple
as an “anonymous struct” with a couple of nice default behaviours: destructuring ((a,b,c) = tup
), ordering (lexicographic), etc. All of them can be replicated by extending the relevant functions for an explicit new struct
; your call what leads to more readable code.
The more code you have that uses your “anon struct”, the more likely it is that the boilerplate pays off. Starting with Tuple
code and eventually refactoring to struct
can make sense.