Is OffsetArrays.jl a poison pill?

So, my intention was to ask if you saw this as a binary choice. Thanks for answering.

2 Likes

I think part of the problem is that currently the ecosystem is in a “frustated” state: on the one hand thanks to the flexibility of the language it’s possible to define all sorts of new array types that come with non-standard (at least compared to the old 1-based standard) indexing , but on the other hand the type hierarchy / type system hasn’t followed to really precisely express what types are supported by a package.

This then leads to the by now well discussed problems. Right now the ecosystem is sitting at an unstable point and it could go either way: descend into chaos or evolve the type system and move to a better place which is not only very generic for array types but also correct/safe.

2 Likes

Or alternately,

for t in axes(x,1)[begin+3:end], for i in axes(x,2)[begin+1:end]

The axes may be indexed as arrays as well.

1 Like

Tuples and NamedTuples are the two I want to be seen as of an abstraction.

Here’s an example for finding the middle index of an odd length or the middle 2 indices of an even length. Here’s the (afaik) correct way:

julia> function middle(x::UnitRange{Int})
           # divide a distance, make room for reference point
           step = (length(x)+1)á2 - 1
           first(x) + step, last(x) - step
       end
middle (generic function with 1 method)

julia> middle.((1:10, 0:9, -1:8, 1:9, 0:8, -1:7))
((5, 6), (4, 5), (3, 4), (5, 5), (4, 4), (3, 3))

When we’re not using generic methods, length(x) and last(x) would both look like n for 1-based indexing; the returned value would look like 1+((n+1)÷2-1), n-((n+1)÷2-1). So what happens if we mix them up?

julia> function wrongmiddle(x::UnitRange{Int})
           # dividing a reference point makes no sense
           step = (last(x)+1)á2 - 1
           first(x) + step, length(x) - step
       end
wrongmiddle (generic function with 1 method)

julia> wrongmiddle.((1:10, 0:9, -1:8, 1:9, 0:8, -1:7))
((5, 6), (4, 6), (2, 7), (5, 5), (3, 6), (2, 6))

Incidentally, I kinda dislike begin end in indexing brackets, something derived from first length last would fit the corresponding methods’ names better.

I opened a Github issue for this a while back but haven’t had time to come back to it. Hopefully someone that understands the nuance of all these iteration options can explain them in a comprehensive doc page.

2 Likes

Is Quaternions.jl a poison pill? How about Unitful.jl?

Not all code operating on Number types is correctly structured to handle non-commutative numbers like quaternions. And not all code operating on Real types is correctly structured to handle dimensionful types ala Unitful. Does that mean that all generic numeric code in Julia is broken, or that all numeric code should be restricted to concrete types like Float64 that are tested? The latter restriction would be a huge blow to the flexibility of the Julia ecosystem as new number types, from DoubleDouble to Measurement and Interval, come along.

Whenever you define a new subtype that stretches the conception of the parent type, it’s a challenge to generic code and it’s likely that many combinations will not work. But it’s also an opportunity to extend the generality of the ecosystem and improve its robustness. For example, the LinearAlgebra stdlib has been slowly extended over the years to have more support for non-commutative and dimensionful types, though this is by no means complete.

Nor is it terrible, in my opinion, to tell people to test things when they combine independent packages defining unusual new types for fundamental things like containers and numbers, and to expect that exotic combinations won’t always work (in which cases they should file issues and PRs).

44 Likes

Good point. It is probably impossible to write generic-enough code to cover every possible way a type may be extended in the future, but at the same time, many extensions probably will just work. Maybe this sort of correctness verification can be covered in documentation rather than with explicit type union function signatures. Just a simple page that lists all external types that package unit tests have been written for. If you use other types with this package, then you are responsible for writing your own unit tests, or better yet, submitting PRs to the package.

I would also point out that there’s no way anyone could have imagined all the things that can be done with arrays and numbers. These abstractions are hard to get right even with experience and impossible to fully anticipate in advance. The approach Julia has taken is to let people explore and organically react when things don’t quite fit together. We are getting to a point where we now have some better notions of what it means to be an array or a number. The interface of arrays is documented here. There is no interface for numbers—the concept is too general: there are no methods one must implement for all things that are numbers. What then is the point of Julia’s Number type? It simply serves as a way to opt into a bunch of generic fallback method definitions, such as “automatic” promotion for arithmetic operations like + and * (without requiring you to define those) and also some definitions that assume that numbers are value types, rather than containers.

It may be useful to allow interfaces to be formalized and checked automatically, but if we had tried to do that from the beginning with something like arrays, we would have gotten it wrong and the explosive growth we’ve seen of useful and strange array types would have been stiffled before it ever began. I would also note that it’s often quite useful to partially implement an interface: I may want to implement something arraylike, but I don’t need all of the functionality that some arrays provide. How do I know if my implementation is complete? If my code works, then it’s complete. Of course, that’s not fully satisfying when writing code that will be used by many other people who may want more features, but partial interface implementations are very useful for exploration.

25 Likes

I’d instead say, if you’re going to do it, commit to it. A lot of people point to SciML as why they “go generic”, but they don’t adopt our practices. Just look at a recent PR:

There are 40 test groups that each take on average 30 minutes each. That’s about 20 hours of tests that are ran. Do we support unitful?

Do we support abstract arrays that don’t have indexing defined?

How are big floats doing in terms of numerical convergence?

And the list just keeps going. With SciML we commit to it: there’s a huge test coverage and we consider anything that goes wrong with any generic handling an issue. What I see is the issue is the lack of commitment: there are groups that will only test on Array but have generic codes, and there are groups that don’t answer issues within a day about generic codes. That shouldn’t be done.

And anyone who really commits to generic coding will have to use ArrayInterface.jl period.

There’s just so many details that you cannot query from the Base interface:

https://juliaarrays.github.io/ArrayInterface.jl/dev/api/

Nothing wrong with the Base interface though, this had to be learned over time in a way that can quickly evolve with the growing AbstractArray ecosystem. But if you aren’t using that package, then either there are obvious issues with your generic codes that are dead obvious and easy to identify, or there’s a re-implementation of that in the package with a whole lot of Requires.jl (I only know of one case that’s the latter). Yes that’s a strong statement but those primitives were all made for a reason and I can give you counterexamples from all over the ecosystem. For example, how many of you can name off the top of your head a commonly used array type for which eltype(x) !== typeof(x[1])?

We need a similar effort for Numbers, but we just haven’t gotten there yet because there’s still a lot to do with AbstractArrays.

5 Likes

Though the vision is promising, I think ArrayInterface needs to fix issues like ismutable wrong for FillArrays ¡ Issue #77 ¡ JuliaArrays/ArrayInterface.jl ¡ GitHub first before many libraries can make good use of it. Otherwise one is forced into the same Requires dance as before, just with another level of indirection (e.g. replacing @require XYArrays... with @require ArrayInterfaceXYArrays...)

Well yeah, figuring out the full interface is hard and evolving. But it at least gets many many more of these cases right than someone trying to roll it themselves, because there’s been hundreds of these edge cases over the years.

Yeah, this is an important point. That advice is poorly written, the

should be used chiefly for dispatch

doesn’t explain much and is somewhat tautological

if I had to shoot for a better articulation:

I don’t think dispatch constraints “on their own” should be relied on as the de facto implementation mechanism to enforce an interface boundary, even though it seem they commonly are applied this way in the Julia world. As a result principles like Style Guide · JuMP crop up to combat the effects (which I don’t disagree with as a design principle for certain packages/situations - but i think maybe it’s overkill in the other direction)

It’d be nice if I had an alternative catch-all recommendation for what to do instead, but I don’t :grin:

it’s almost too easy sometimes to poorly pun on Julia’s subtyping algorithm to try to enforce behavioral constraints. I think you CAN successfully do that in situations where a type’s/package’s design is amenable to it (see below), but i’m not sure that’s the case in all situations, and in some situations it’s a kind of a leaky mechanism for this purpose

for example, I do think it’s okay to enforce interface boundaries via dispatch constraints in situations where you’re defining behaviors atop a more strictly defined compositional interface (e.g. a wrapper type A(x) that explicitly surfaces the “A-like” behaviors atop data x). that case probably meets the poorly-phrased definition of being “used chiefly for dispatch”. IMO method dispatch makes it quite pleasant to implement these cases, as long as you don’t run into promotion-related woes for n-ary methods, which is sometimes the case in a multiple-dispatch-driven system…usually if i hit that point i end up wishing there were some haskell-y norms/capabilities i could lean on to resolve things to the intended behavior, vs. hitting method ambiguities

Of course, and there has been a lot of thought and effort put into getting those cases right. My point was that as a package author who wants to be a good citizen of the ecosystem and practice

here is how things break down:

  1. I need to know if an array type is can be accumulated into in-place and don’t want to use requires on the half dozen or so types that I know won’t support this.
  2. Oh cool, ArrayInterface advertises a function for this.
  3. Wait, this function doesn’t work for most of the types I care about. No matter, we can file an issue.
  4. Somebody already filed an issue 2 years ago, but it’s not clear there is consensus on whether it’s considered a problem, let alone how to fix it.
  5. So I’m back to needing Requires, except now if I want to use ArrayInterface I have to pirate its own methods.

Perhaps the broader point here is that interfaces need buy-in, and if they aren’t getting that it is worth analyzing why. How much of it is technical issues like the example above vs. concerns around maintenance timelines when something breaks vs. other non-technical factors? Is it just a matter of getting the word out or is some negotiation required? etc.

This applies to the other (implementer’s) side of the interface boundary too: when the reaction to an interface varies from this seems unstable (I know it’s out of date, but there has been no follow-up) to can’t you handle this? to unresponsive to non-existent, what incentive is there to consume this interface when we’ve been told repeatedly that Requires is a no-go?

1 Like

I expanded my prior low effort attempt via an edit.

julia> foo(x::BaseAndCoreArrays) = x[1:length(x)]
foo (generic function with 1 method)

julia> foo([1,2,3])
3-element Vector{Int64}:
 1
 2
 3

“you probably want” is the crux of the issue here. There might be more than one type of Julia user. I would rather someone restrict their dispatching and be correct rather than accept a AbstractArray and make incorrect assumptions.

Is this the best way to do to restrict dispatching? Probably not, but it is fairly simple. Traits are probably the way to go.

I definitely disagree that Julia 2.0 should have special loops for certain kind of arrays or indexing. I think we could create a Base method to obtain a one-based view of an AbstractArray. That could be done before Julia 2.0.

I agree that “avoid[ing] restricted functions to the tested surface area” is the community preference especially when applied to library code. However, I do think we should enable users with tools to either test their code broadly or restrict their their dispatch to “known” types if they so wish. Perhaps some users do not really want to write methods for AbstractArray{T,N} but are reaching for it because they just wanted to support UnitRange and SubArray?

A practical option here would be to insert OneBasedArray <: AbstractArray (or Abstract1Array?) in the type hierarchy. That would be considered non-breaking according to ColPrac guidelines and would allow people to program to abstract arrays assuming 1-based indexing. But I’m not sure it’s really worth it: if you’re writing code that’s really generic enough to apply to all kinds of different arrays, then using begin instead of 1 seems like not that big a deal.

12 Likes

let me know when the numbers are to get attention

it is not that big a deal, less were it not to have been such kindling in userverse,
This may be more of a pragmatic decision, to allow any member of the Community to respond “Not any longer, all our Arrays subtype OneBasedArrays”
There is value in a pithy response that makes sense to most everyone.
And it would circumvent some future silliness when arrays become sentient.

1 Like

ArrayInterface.jl moves fast. Like:

No, we already solved this. See

https://github.com/JuliaArrays/ArrayInterface.jl/tree/master/lib

Of course, it’s not a nice solution, we’d prefer that StaticArrays.jl defines the right functions, but there is ArrayInterfaceStaticArrays.jl and there’s no Requires.jl required.

Solved. It doesn’t use Requires.jl.

Look, it’s not even in the Project.toml.

That was on a personal repo where absolutely no ArrayInterface.jl devs were pinged. If this is needed, we can make an ArrayInterfaceBlockArrays.jl and hold it until it gets upstreamed. No Requires required.

Sure, but all of those issues were handled right? No Requires.jl, load time is in the low microseconds so less than 1% of the package, etc. Seems like that’s all done? We can re-bump that.

In the meantime, the working code lives in https://github.com/JuliaArrays/ArrayInterface.jl/tree/master/lib/ArrayInterfaceOffsetArrays which is a registered package in Base so you can use it today.

That is merged and completed. StaticArraysCore, ArrayInterfaceStaticArrays, and ArrayInterfaceStaticArraysCore.

Point by point, ArrayInterface already handles all of those cases by subpackaging. Is that nice? No, it would be nice if all AbstractArrays actually defined their interface. But SciML needs this in order for generic code to work, so we’re shouldering the effort for now.

It can start anytime, but from this post you can see why ArrayInterface has been such a big project: we’re implementing the interface for every AbstractArray we can find :sweat_smile: . That’s also why we can tell you about all of the weird edge cases though.

I should end this by saying I know there is a caveat here that, almost by definition, ArrayInterface.jl is still not super stable and is a fast moving package which is terrible to have as a low level interface that everyone depends on. So while I wish this would go into every array type, I would also agree that we’re probably at least a year away from really being stable enough for that. And actually, I think this is the kind of thing that really needs to stabilize and head into Base. Once this interface is more set in stone, I want to do a PR to Base that adds “these are 20 traits that we know help generic codes”, and then add that to the AbstractArray page. I don’t think we can say it’s a good part of the language until it’s all the way up there.

Until then, it’s the bandaid that SciML needs to maintain so everyone else is free to know that it cannot break for that reason :sweat_smile:.

6 Likes

I think I was imprecise. ArrayInterface removing Requires from its own dependencies was a massive effort and very much appreciated. What I was referring to is that one still needs Requires to import ArrayInterfaceStaticArrays etc. to avoid taking a dependency on the underlying array package.

One alternative is just adding all the ArrayInterface* bits one needs as direct deps. That’s fine for one or two types, but it quickly gets out of hand especially when you’re only calling a subset of the interface. The other alternative is making users import the right subpackages themselves, but that is unlikely to fly.

That’s totally understandable. My ask (with ismutable/ismutable wrong for FillArrays · Issue #77 · JuliaArrays/ArrayInterface.jl · GitHub as the motivating example) is that interfaces favour being conservative and maybe asking users to install a subpackage instead of generating false positives about what they support (with or without subpackages). There will always be new array packages that haven’t yet opted into the interface, and this approach would allow us as interface consumers to avoid pirating both ArrayInterface and the array packages whenever the former falls back to a too-optimistic code path and blows up.

1 Like