My wishlist for the next version of Julia

I’m not going to be difficult to convince. I told my colleague about this (especially arguments). He’s been writing some simulation software for a couple of years, and a substantial part of the code base is parseargs and related functions. Not sure if those were tears of joy or pain.

1 Like

As mentioned previously, if there is a function like the below in Matlab
[A, B, C] = f1(x, y, z, ...)

I have the option of doing the below to tell the function do not bother to calculate B and C, so as to improve speed:
[A, ~, ~] = f1(x, y, z, ...)

How can I do the same thing in Julia?

On an unrelated note, why are real values replaced with an empty value in some functions? For example, m = size(Array). m[1] will be the length of the rows and m[2] will be the length of the columns. If the Array is a one-column data, the m will be equal to (10, ) instead of (10, 1), causing an error for m[2].

Right. In Matlab the loss of legibility is probably also due to the lack of a return stuff statement. The function just ends…

I am not sure that it does that.
I am pretty sure it stll computes B and C it just discards the value.
That is what this post says
That functionality in julia is written with underscores instead: A, _, _ = f1(x,y,z)

Occationally the optimizer (at least i know julia’s can) will avoid computing things that are not used.
But that is kinda regardless of if it is named _ or some actual name with letters.
Haskell (and weirdly TensorFlow 1.0) is the king of avoiding computing things that are not used, through lazy execution.
The rest of us have to rely on the optimizer slicing stuff away.

8 Likes

Julia is getting closer and closer to dead code elimination though.

6 Likes

Yep, it works:


julia> function bar(x)
       y, _, _ = foo(x)
       return y
       end
bar (generic function with 1 method)

julia> @code_typed bar(10)
CodeInfo(
1 ─ %1 = Base.mul_int(2, x)::Int64
└──      return %1
) => Int64

See that the code_typed only includes the multiplication by 2.
Not by 3 or 4.
as it optimized those unused computations away.
That is in julia 1.7.
idk when it was added.

Julia 1.0 for the same code gives

julia> @code_typed bar(10)
CodeInfo(
2 1 ─ %1 = (Base.mul_int)(2, x)::Int64                                                                                                        │╻╷ foo
  │        (Base.mul_int)(3, x)::Int64                                                                                                        ││┃  *
  │        (Base.mul_int)(4, x)::Int64                                                                                                        │││
3 └──      return %1                                                                                                                          │  
) => Int64

Though the LLVM optimizer catches that anyway

julia> @code_llvm bar(10)

; Function bar
; Location: REPL[3]:2
define i64 @julia_bar_35231(i64) {
top:
; Function foo; {
; Location: REPL[2]:1
; Function *; {
; Location: int.jl:54
  %1 = shl i64 %0, 1
;}}
; Location: REPL[3]:3
  ret i64 %1
}
7 Likes

size(A) will return a Tuple that has as many elements as A has dimensions. Examples:

julia> A = ones(3) # One-dimensional, one-column data
3-element Vector{Float64}:
 1.0
 1.0
 1.0

julia> size(A)
(3,)

julia> A = ones(3, 1) # Two-dimensional, one-column data
3×1 Matrix{Float64}:
 1.0
 1.0
 1.0

julia> size(A)
(3, 1)

julia> A = ones(3, 1, 1) # Three-dimensional, one-column data
3×1×1 Array{Float64, 3}:
[:, :, 1] =
 1.0
 1.0
 1.0

julia> size(A)
(3, 1, 1)

(Note that (10,) is the syntax for a Tuple that has just one element. The comma , is needed to disambiguate from (10), in which the parentheses are used for grouping operations. I.e., (1 + 1) == 2 != (2,), whereas (1 + 1,) == (2,) != 2.)

5 Likes

No in Matlab you can really skip the unnecessary computation. This is idiomatic in Matlab:

function [a, b] = myf(x, y)
    a = x+y;

    % Calculate b only if requested
    if nargout == 2
        b = x*y;
    end
end

Then calling v = myf(2, 3) will skip the computation of b, while calling [v, w] = myf(2, 3) will compute everything.

1 Like

that’s not what the tilde does though. No?

That is a different feature

3 Likes

Ah indeed! with the tilde it will compute both. But I think that was just a typo in @leon’s message, he probably meant to write

I have the option of doing the below to tell the function do not bother to calculate B and C, so as to improve speed:
A = f1(x, y, z, ...)

I was going to check this, but defining functions in Matlab is really painful. I miss Julia when I’m at work.

2 Likes

Same here… “ugh I have to create a file somewhere”

4 Likes

Well you can do something like this without writing a file:

>> f1 = @() deal(nargout);
>> a = f1()               
a =
     1
>> [a, ~] = f1()          
a =
     2

which shows that with [a, ~] Matlab sets nargout=2 so it computes everything.

I’m not going to argue it’s anything as nice as a function definition in Julia :slight_smile:

2 Likes

I guess that has to be independently documented for every function that may or may not return multiple values, right?

In that case it does not seem a great advantage relative to having two very similar names, or a kwarg.

1 Like

Not really, when you call a function you can always store less results than offered by the function. Documentation is only needed if the behavior of the function changes depending on the number of outputs, or if you want to make it explicit that computations can be avoided. If not documented, it’s like in Julia: maybe the unneeded computation is skipped by the optimizer, maybe not :slight_smile:

In Matlab there are no vectors, only single-column matrices. There are arrays with higher numbers of dimensions, however, so it treats two as a special number of dimensions. I get how they ended up there: early versions only had matrices, which gets you shockingly far, only later did they add higher dimensional arrays. However, that’s unsatisfying (why is two special?) and causes problems. Think of all the cases where Matlab does something special and different when a matrix happens to only have one column or row and something entirely different when the matrix has more than one column or row. This makes it very hard to write reliable code because it’s very common for everything to work fine until encounter data that happens to only have one column, and then :boom: – or worse still, you silently get garbage results. Example: if you do sort(A) in Matlab, it sorts each column of A; unless A happens to have a single row, in which case it sorts the row. There are many similar examples. There are also no scalars, only 1x1 matrices. This can cause many problems too: since scalars are often treated specially, there’s a lot of Matlab code that breaks if you have a matrix that happens to have size 1x1.

Julia, on the other hand, distinguishes all possible dimensions of arrays, including vectors and zero-dimensional arrays. When you do size(A) the number of dimension values you get back is the number of dimensions of the array. What you’re asking for is that size(v) give at least two dimensions back even when v is a vector with only one dimension. Which it could do, but again, why is two special? Why not always return three dimensions? Why not four? Maybe it should always return an infinite tuple object where all the trailing values are 1? (We could actually do that.) But instead, we return a finite tuple where the number of elements is the dimension of the array. When you’re doing m = size(v); m[2] and it errors, you could instead do size(v, 2), which will give you 1 even if v is a vector—you can ask for any dimension and if it’s more than the number of dimensions of the argument array, you’ll get 1 as the answer. Note that some people find this behavior disconcerting, but it does allow writing code that treats vectors like matrices or even higher dimensional arrays.

22 Likes

I love the fact that we could do that. DON’T DO THAT. But I love that it’s possible.

8 Likes

Yeah, it would be weird… but it’s a simple wrapper type around a normal tuple.

3 Likes

But there’s nothing automatic about this, inside the function there must be an explicit check with nargout, and then it must deliberately skip computations. If this isn’t implemented, all outputs are computed no matter how you call the function.

3 Likes

Yes, of course, that is what I meant.