The expression solution

Hi

We all know, how good Julia solves the age-old ‘expression problem’ and I have a couple of questions to anybody, who uses both Julia, and some language like Haskell, F# or OCaml.

As you know, as a user of these languages, they do use disjoint unions to solve the expression problem.

Now, for any of you who use both, disjoint unions and multiple dispatch:

How do they differ? I experience both as very powerful, and ask myself if it would be conceivable to utilize both, or if there is only room for one of them, in a single language.

Could they cooperate? Do they work together?
How is that with the type systems?

Do you like the idea, of mixing them?
And what is your favorite?
Thanks

I’m not sure I’m familiar enough with this terminology. By disjoint unions, are you referring to pattern matching like e.g. this in Haskell (where the listed cases form a (not entirely disjoint) union of the set of all possible cases)?

take  0  _       =  []
take  _  []      =  []
take  n  (x:xs)  =  x : take (n-1) xs

Or are you referring to something different altogether?

2 Likes

I refer to sum types / discriminated unions / objects variants and so on.
And yes, you pattern match on them.

Although that itself is simply half of the solution.

Sum types and multiple dispatch are orthogonal concepts. There’s SumTypes.jl and some others, but @Mason knows the package landscape here better.

2 Likes

I know. There is also MLStyle.jl
But I would like to know, how they compare to the programmer, in practice.

Sum types make illegal state un-representable, and if used correctly, they can replace unit types.
So I am asking about the actual, practical differences, in every days life.

To me, the main difference is that sum types are closed, while subtyping and multiple dispatch is open.

Closed here means that if a programmer declares a sum type, the list of variants that subtype has is fixed and cannot be changed later on without editing the original source code. Additionally, all code which takes in a sum type, has to explicitly handle every possible variant it might have. This is really useful for forcing programmers to think about all outcomes. For example, people often forget that findfirst doesn’t just return Ints, it can also return nothing if no candidates are found. With sumtypes, you force anyone who uses findfirst to plan for the case where no candidate is found.

On the other hand, multiple dispatch and subtyping is all about genericness. They’re open in the sense that you write very general code which can work on the broadest list of possible types, and then you fill in specific methods that can benefit from special handling for specific types. One package can declare an abstract type and write functions on it, and then another package can then subtype that that type and write their own specialized methods on the functions from the other package. Someone writing multiple dispatch code specifically does not need or want to think about every possible subtype of an abstract type.

This all means that (as @Sukera said) they’re mostly just separate tools for solving separate problems and which is better for you depends on your use-case and preferences. They’re both just very fancy ways of writing if/else clauses, but the way they impact software and design is pretty different.

12 Likes

And do you think they do benefit a language, side by side, or is one to pick here over the other?

I am asking, since multiple dispatch seems to be central to Julia. Adding and using sum types, could delude that, or?

Would lead to less code reuse?

As I said, they’re different tools for doing different things. It’s like asking if there’s a benefit to owning both a saw and an axe. There’s no general answer to that question.

Sure, they’re both doing similar things (cutting or splitting things in two), but they have pretty different use-cases and niches. Some people can live their whole lives without owning a saw, but use an axe multiple times a week. Some people are the opposite. Some people find both tools really useful. It really just depends on what you’re doing and what your preferences and needs are.

2 Likes

I am more asking from the perspective of a language designer.

I think, if we would add sum types, that could lower the use of multiple dispatch.

They seem similar enough, so that people would opt out to use sum types at places, at which multiple dispatch could be used instead.

I am not sure, that their use cases are so far apart, that both at the same time would not delude the use of the other.

Otherwise, I would not know or understand, why Julia has no sum types.
They are very fundamental to all languages I used before.
So I am trying to understand the reasoning.

Is it for the focus on dynamic typing?
Appreciate the answers. :slight_smile:

People also tend to forget that not everything that’s indexable does that indexing with an Int - that’s only true for arrays :slight_smile: findfirst itself only stipulates that you can index the given collection with what findfirst returns (or it’s returning Nothing).

I don’t think sum types and multiple dispatch are mutually exclusive. Sum types are about handling what you as a function get returned from a function call in your body - multiple dispatch (as the name implies) is about how you as a function handle what you receive in your own signature, usually by defining multiple methods handling possible types. From that POV, the two approaches are complementary, and I’d love to have them both in the language - if only to have the compiler hit me with a “you forgot to define a method handling nothing here” (we can partially get this by running JET.jl, but it’s not enforced).

Of course, in reality you don’t want to define methods for your follow up code to handle edge cases like nothing specially, which is why I think some form of match (as is common in languages with sum types) to handle sum types efficiently would be neat. Destructured pattern matching is then just fancy sugar on top.

2 Likes

And why dont we have sum types, then?

We do have them, it’s already been linked: SumTypes.jl.

1 Like

I mean by default. Standard library, endorsed and described in the official documentation.

Compiler & type system development is hard, and there have been more pressing concerns (like compilation speed, compilation caching, …) since 1.0.

It’s also good to remember that retrofitting sum types into the existing language (e.g. by forcing users to take care of the return type of findfirst before their code compiles) is (in general) a breaking change - and Julia just isn’t at that stage of development anymore. Arguably, code that uses findfirst and ignores the nothing case may be broken - but it may also not be, if in the semantics & context surrounding the call the developer knows that this will never be nothing. That’s not something we can know though, so changing it and forcing users to handle that case explicitly (barring a perfectly knowledgeable compiler, which we don’t have) is not something that can be done right now (unfortunate as that is), because that would change currently working (& correct!) code into code that is now broken.

That’s not to say we’ll never get sum types (or something like it) - just that the path to getting there needs to take A LOT of existing code into account, which adds to the complexity/difficulty.

6 Likes

Isn’t 2.0 breaking things?

That’s just a “wish list” of things that would be cool but would be breaking. There are no plans to transition to 2.0 for the foreseeable future.

6 Likes

Or making it even more clear: it is not even certain if the Julia core developers will ever decide to do a 2.0 version. The idea is to put every interesting and breaking idea in this wishlist and if someday the pros overweight the cons do a 2.0 version. However, this time can never come (as the con of breaking a long-standing version gets heavier with time).

4 Likes

I see. Did you do an evaluation of open source code? As in, how widespread the usage of findfirst with a potentially breaking case is, and how many other issues are present?

And let me understand this in certainty: If I install any kind of sum type implementation, every code that utilizes eg findfirst with an unhandled nothing case could break?

No matter the implementation? How is this so strictly related?

No.

No, why would it? SumTypes.jl and others don’t change existing functions. They let you enforce those semantics in your code. It’s a package like any other.

What I’ve been talking about above is purely related to what would happen if Base were to require handling of the nothing case of findfirst. Let me give you an example:

function _foo(x::Vector{Int})
    isempty(x) && return nothing
    idx = findfirst(x) do el
        el isa Int
    end
    x[idx]
end

This will always have findfirst return a valid index, usually the first index into x. Now imagine we wouldn’t have to check isempty(x), because x is in our program that’s calling _foo semantically guaranteed to always contain at least one element. The compiler doesn’t necessarily know that - and thus, if we were to require users of findfirst to always check the nothing case, this (currently working) code, would break & throw a compiler error instead. That’s what we can’t do in a 1.x release (and may not even do in a 2.0 release, whenever/if that comes).

2 Likes

Just my two cents …
Basically, I would view generic functions with multiple dispatch as open, extensible protocols. In this regard, multi-parameter typeclasses are the closest analog in Haskell. Its standard typeclasses allow different implementations to depend on a single type, but are somewhat more flexible as standard OOP message passing as the type can appear in multiple places, i.e., as in Julia’s diagonal dispatch, or even just as the function output. Accordingly, multi-parameter typeclasses seem less used in Haskell. Another important difference is that Haskell usually requires static dispatch, i.e., like type-stable code in Julia, but wrapping an existential type allows to use them dynamically if needed.
Sum types, i.e., tagged unions, are quite different in that they don’t provide extensible protocols/interfaces, but instead describe data with a usually small closed set of variants. The Compiler then helps/forces you to cover all these variants whenever using a sum type. In dynamic languages such as Julia, Python or Common Lisp, you won’t get such help anyways and thus, untagged unions are more common, i.e., findfirst simply returns a valid index or nothing and could morally be typed as Union{Int, Missing}. Thus, the different variants are not tagged as different types at compile time, but can simply be checked at runtime. Those checks might be forgotten though which appears to be a major concern of people coming from statically typed languages. Yet, handling it at every call site might not be desired in every case or require a major restructuring of your code (monads I look at you). Sometimes, bubbling up a value is just simpler, e.g., as commonly done with NaN. In the end, its probably a matter of personal preference to choose between tagged or untagged unions (with statically typed languages leaning towards tagged unions while dynamically typed ones often prefer the untagged ones).

3 Likes