Is adding a type parameter to a struct a breaking change?

If you have previously an exported type MyDateTime in a registered package:

struct MyDateTime
  instant::Int64
end

Would be adding a type parameter be a breaking change under the semantic versioning?

struct MyDateTime{T}
  instant::T
end

This type is returned to the users and a code like this, could break:

dt = function_returning_MyDateTime()
@assert typeof(dt) == MyDateTime

And should be replaced by:

@assert typeof(dt) <: MyDateTime

It can also have performance implication if the user has a structure like this which would become non-concrete (but not break as far as I know):

struct UserStruct
  dt::MyDateTime
end

Do you see other problems that could occur for the users?

In the package CFTime.jl I implemented various types for the time coordinates. Recently I added the possibility to use time precision lower than milliseconds, variable time origin and variable types of the instant containing the duration since the time origin (all using different type parameters). The only change in the old test suite (at 97% coverage, but I cannot anticipate all possible code of the users) was that I needed to change calls similar to typeof(dt) == MyDateTime. But I think that it is unlikely that the user makes such checks.

Another package NCDatasets.jl uses CFTime.jl and passes the CFTime.jl types to users in some rare cases (in most cases the CFTime types are converted automatically to the Julia’s Dates.DateTime, except when a NetCDF file uses a fictitious calendar assuming all months 30 days long for example). Would the additional type parameter in CFTime.jl also be a breaking change for NCDatasets?

This depends on whether the type and its parameters were exposed as API explicitly.

The first is done with export MyDateTime, and is usually not a very good idea, the second is by documenting that users can and should access typefields for whatever purpose, and is usually a very bad idea.

Or maybe

dt isa MyDateTime

which is invariant to the changes you just made, and is thus better style. But if you need to test for something being something, provide a function in the API.

Frankly, I would just release, announce to the users (most dependencies seem to be in the JuliaGeo/JuliaOcean groups), and let unit tests catch problems. Code that is broken by this change should be refactored anyway.

Try to redesign the API so that it is less dependent on particular types and their details, focus on functionality independent of types.

1 Like

Thanks for your comment. It is indeed an exported type. I clarified this in my original post.

If MyDateTime is already exposed as a type and its constructor method, maybe this could be preferable. That said, it may not help if your API made more behavior public.

 # minor revision if also public, consider discouraging MyDateTime
struct OurDateTime{T}  instant::T  end

MyDateTime = OurDateTime{Int} # still a concrete DataType

#= Now many things still work as usual =#

struct UserStruct
  dt::MyDateTime # still concrete
end

dt() = MyDateTime(3)
@assert typeof(dt) == MyDateTime # still passes

userfoo(::MyDateTime) = true
userfoo(dt()) # still works

This topic is also approaching the question of whether changing the results of reflection is a breaking change, and I don’t think it is, usually. For example, the number and names of fields in a public type, like DataType, are often internal, so fieldnames is expected to change. APIs usually don’t make any promises on that.

Your approach of declaring a type alias MyDateTime = OurDateTime{Int} is indeed quite interesting. But I am afraid that this would be a bit confusing for the users to have all these aliases (CFTime has already 6 time types which would require 6 alias).

You are right that in my case, all fields of the types are considered private (and thus not documented). All information is only accessible via accessor functions (like Dates.year, …). All previous constructor functions (like DateTimeStandard(2024,3,6)) still work after the change.

Why would it be a bad idea to export a type which exported functions will return? How are users supposed to specialize methods on those types or detect a value as a member of the types? On a surface level the answer is “import the name or say MyModule.MyDateTime” but… why? What makes this good practice, and exporting it bad practice?

1 Like

In a nutshell, exposing types and their details as API makes it fairly difficult to refactor your code (as this topic demonstrates). Types should be used for organizing dispatch inside a package, and should be exposed as API only when that is absolutely necessary.

Generally, subtyping from another package is code smell. The API should be organized so that it exposes traits for testing if something is the correct type. Another reason for this is that Julia has single inheritance, so if MyType <: OtherPackage.Supertype, you have to fit everything else into that type hierarchy.

Note that this was realized only when Julia was used at a larger scale, and many early packages (and Base) expose types. This is historical baggage we will have to deal with at some point as the language evolves.

2 Likes

My earlier suggestion to expose MyDateTime = OurDateTime{Int} is not a change that should be done routinely; even v0 APIs should be stable to a degree. Deprecations should be extremely rare and just as decisive as additions to the API.

If possible, a user could only need to provide files and base types to the package’s methods to compute the desired results. That said, if the methods return custom types for the user to manipulate, I agree it only makes sense to include them in the API. MyDateTime certainly seems intended for that.

If instances of a type are being returned by exported functions, then the type is part of the API, by any sensible interpretation of the term. The Interface for Programming Applications using the package includes the type, because instances of that type cross the package interface into user code.

Ok but… a trait is just a type!

The package naming guidelines also suggest a plural name for packages which primarily export a type. I don’t see a way to reconcile any of this with the idea that exporting a type is “usually not a … good idea”.

Not adding implementation details to the public API, whether functions or types, sure, good advice. But a type where instances end up in userspace is no longer an implementation detail.

How would anyone use Base if it didn’t export any types?? Julia has this excellent (if underdocumented) generic quality, and a huge part of what makes this work is some thoughtfully-chosen abstract supertypes, and a nice collection of concrete types to go with them.

I want to at least try and stick to the topic here, however. Because it’s an interesting question: @Alexander-Barth has API functions which return a type, that makes the type part of the public interface, and that’s ok. The question here is what aspects of the type are implicitly part of the public interface. SemVer doesn’t answer this question, the upcoming public keyword only works on names and can’t help us out.

It seems broadly agreed that struct fields are an implementation detail unless documented to be otherwise. It doesn’t seem debatable that adding a type parameter is a change to the public API either, but of course the topic is whether it’s a breaking change.

This:

Seems like a good basis for an answer to your question. The style guide specifically advises against using == for type membership, and I see this as giving you leeway to add the parameter. The release notes should clearly document the change, how it might break code, why that code was already a bad idea, and how to fix it, but it otherwise is on solid ground.

Julia provides a function for this, isa. Inventing a new function for a user type, which does the same thing as a function provided by the Julia distribution, isn’t such great advice. Multiple dispatch and a type system which leads to easy genericity are the most novel and powerful features of Julia, one should work with that whenever possible.

3 Likes

Most type parameters are implementation details and should not be documented in the docstring. Here’s a toy example:

"""
    Foo(x::Integer)

Make a `Foo` object.
"""
struct Foo{T <: Integer}
    x::T
end

If a user wants to wrap a Foo in their own type, then they should probably make the wrapper type parametric in order to avoid performance issues with abstractly typed fields:

# User code.
struct Wrapper{S}
    foo::S
end

Of course, some type parameters are public API, especially for some container types. For example, the Dict{K, V} type documents that K is the key type and V is the value type.

1 Like

This comment is a bit meta, but still topical:

This illustrates a bad equilibrium in Julia’s community of practice. That community takes SemVer quite seriously, which is good, but Julia doesn’t provide fine-grained access control primitives. Nor should it necessarily, but it leaves the critical “what is public API” part of semantic versioning underdefined.

“If it’s documented then it’s API” looks like the most popular approach, but this creates bad incentives. More documentation is almost always a good thing, the only maintenance burden a docstring should impose is the need to keep its contents up to date.

If I’m using someone else’s struct, I would much prefer to know the parameters, types, and fields. It would be better if there were some mechanism to signal what’s considered stable and what isn’t, even if it was a docstring convention, that would help.

Any ideas I have for changing this are very much half-baked, but it would be good to find a better equilibrium, where maintainers of stable package are never reluctant to document something.

4 Likes

Not at all. The API may just describe the interface that the result conforms to, without committing to a specific type. This is the best approach, eg see eachslice which have you a Generator in 1.8 and a dedicated Slice object from 1.9, without breaking SemVer.

Sure, but it is used in itself and can be documented as such, and you can refactor yor code much more easily.

Well, that is up to you, but you will be using Julia non-idiomatically. Consider eg

julia> fieldnames(Dict)
(:slots, :keys, :vals, :ndel, :count, :age, :idxfloor, :maxprobe)

Do you really need to know these to use dictionaries? Why?

1 Like

This is a good counterexample, but I’m not convinced it generalizes. Iterators are specifically used for their results, calling them by hand is onerous because of the state variable. In for var in custom_iter(instance) one can probably get away with swapping out the invisible return value of calling custom_iter, but the type of var, I would argue, is definitely part of the API.

The case we’re considering appears to be an example of the latter.

Wanting to know something doesn’t amount to using Julia non-idiomatically. I don’t see how it could, actually, since it isn’t writing code.

A good example of why it’s better for things to be fully documented is if I uncover a bug in some package I’m using, or behavior I think the package should have, and clone a fork of it locally to work on a PR. This is rarely possible without using internals, and of course, since I’m now doing work on the package, using the internals is perfectly appropriate.

A possible solution would be to add semantics for an # Internals section that would work the same as # Extended Help, but would convey that the documentation under the fold is not part of the public interface. However it might work, the idea that things shouldn’t be documented because it means you can’t change them needs to go.

Sure, there is even API to retrieve it (eltype(itr)), but the point is that generally you don’t need to know details about the type of itr, just the interface.

In that case, sure, dig into the internals and fix the bug.

But as a user of some code, generally you want to separate the interface from the implementation. Usually the details of a particular type are part of the latter, though of course there are exceptions.

You can of course code the way you like and make types, their parameters, and fields part of your API if you really prefer. I am just advising against it based on experience and the history of the language, but it is fine if you want to do it.

Note, however, that a lot of types are really hairy because they have parameters computer by the constructor. Eg

julia> typeof(@view zeros(3,3,3)[3, :, 1])
SubArray{Float64, 1, Array{Float64, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Int64}, true}