Would it help to have a tool that automatically determines, which issues should be closed?

I’m looking at big packages, like Plots.jl, GR.jl or DataFrames.jl, which have all 100+ open issues. And from my humble experience, issues get lost there very quickly if not resolved immediately. So I wonder if it’s possible to improve the process with some technical solution. It should be relatively straightforward to build an algorithm with UI, which group similar issues together, so one can process them in batches. The same similarity metrics could be used to match open issues with closed issues PRs, so we could try to predict, which of them are already closed, and which are out of date.

I’m working on a kinda similar project, so drafting this tool shouldn’t take long. Do you think that it would help? Or what is the bottleneck with issue management? I never maintained such big packages, only few small repos, so please be kind if I miss the big picture :slight_smile:

P.S. Not sure if it should be in tooling, but don’t know where to put it otherwise.

1 Like

Anyone can help with problems like this by going through old issues and trying to reproduce the bug (i.e., updating to the latest version of every package, potentially updating the code in the bug report to the latest syntax). If it’s reproducible, you can post the latest MWE. If its no longer reproducible, you can leave a comment asking that it be closed.

To pick a random example: https://github.com/JuliaPlots/Plots.jl/issues/535#issuecomment-592936130

2 Likes

Maybe. But I hate that projects where time to time you are bumped with 'Hei, we noticed that there hasn’t been any activity on this issue by a while. Isn’t the issue solved?" and if you forget to respond, you get a “Thank you, the issue is now closed” :-/

4 Likes

If there really is a simple technical solution, I think it could be a huge help. Julia itself would also be a good source of examples since there are >3k issues, many of which are surely stale or duplicates. But I wonder if it’s really so easy; if so, wouldn’t GitHub have done this?

2 Likes

imho: a reversed time order … is a good start :

I remember a thread not all that long ago where there was some discussion of a github bot that would ping owners of stale PRs. So I’d imagine a technical solution exists. I’d imagine issues fall much lower than PRs on the priority list though…

Thanks everyone for the answers!

@odow , the whole point is how to make the process more scalable, as it seems that the process is not completely optimized.

@tbeason , yep, focusing on PRs makes total sense.

@tim.holy , there are many things, which GitHub haven’t done… Given limited free time, I should be able to have some draft in 1-2 weeks. Would you maybe be interested in checking it when ready? It would be terrific if you could say if it’s something you’d find useful for yourself.

1 Like

I’ll be happy to take a look, but also let others give it a whirl—you’re more likely to get useful feedback that way.

I don’t think this is something that can be solved without human interaction. Either the maintainers spend a considerable amount of time trying to reproduce old issues or the owner has to do it by pinging and reminding them. I suppose some sort of BugFest will be needed.

I use the github “stale bot” - if an issue isn’t marked as a potential enhancement, it eventually gets closed. It’s useful because people often open issues but then “move on” (or lose interest), and the stale bot eventually comes along and jogs people’s memory to do something. Everyone get notified of imminent closure, and if nobody responds, the issue eventually gets closed. It can always be re-opened.

I find it works quite well, to keep things tidy. Open issues prey on the mind. :joy:

4 Likes

We use it also in the General registry to automatically close pull requests that have been inactive for more than 30 days. Users can always resubmit the package/version at any time, after fixing the issues that prevented automerging, but at least we can keep the list of open pull requests shorter.

2 Likes

As a user reporting bugs, I find “stale bots” quite frustrating.
I spend time and effort composing a good bug report for a problem in some piece of software, and if I don’t hit one of the few packages where open-source maintainers are not swamped in backlogs, I then have to leave meaningless “Up” comments every N days just so some machine does not mindlessly close the issue (with the problem still present!).

Personally, the Ubuntu bug tracker was the worst for me - I reported a sizeable number of bugs over the years, and I think all but a handful were closed “due to inactivity”, just because the maintainers don’t find the time to deal with it (or even triage it). At some point I just stopped bothering to report bugs…

Also, I wonder what this practice does to your rate of duplicate reports…

In my opinion (and experience managing an OSS issue tracker with a high three-digit number of open issues), a better strategy can be to:

  • Set up a process to ensure that most/all incoming issues get triaged (labeled, closed as inappropriate, classified, assigned, milestoned,…)
  • Dedicate time/resources to look at old issues, and reproduce or close. The github sort-by-least-recently-updated view already mentioned works well for this
  • If possible, plan for an issue resolution timeline: label as needs-analysis for devs to look at, needs-info from reporter, assign a milestone target (could be next-minor, next-major, “future” :wink: ). This way you also know when your milestones are getting overfull.
  • I think only issues that remain at “need-info” waiting for reporter feedback for a long period should be eligible for autoclosing, as this more reliably indicates “lost interest” by the reporter
  • In the end, take a look at your project priorities. Maybe the project should spend more effort on maintenance/fixing bugs than on shiny new features. But of course, this is often hard to sell (unfortunately)

Let me ask a provocative question: If open issues prey on your mind, why are you bothering with a public issue tracker (or users)? Sure it feels better to have few open issues in your tracker, but if you achieved that by just autoclosing issues nobody finds the time for to fix, is that metric still meaningful?

If users report an issue, and the maintainers do not react, what remains for them other than “moving on”? What would you like them to do instead?
Mind that many people don’t have the technical competence to fix bugs themselves, otherwise you’d probably have gotten a PR instead.

Finally, let me say that I definitely know how it feels to not have enough time/energy to fix all bugs in the projects I help maintain. This means that issues stay open (sometimes for years), and that I try to focus on fixing bugs, not adding new features, but does not mean I just close reports because I don’t find the time to resolve them.

4 Likes

Good question. I shall ponder it.

In a few packages I keep the issue list short by fixing bugs and then the really hard ones get tagged in a way that reminds me why they aren’t fixed yet. I find that settles my unease at having open issues. Example: https://github.com/timholy/Revise.jl/issues

1 Like

I can speak of DataFrames.jl. Issues are not lost there. They are classified and assigned priority. The only issue is that given a number of things to be added things that are not considered by core contributors as important do not get handled for very long period as other things are more pressing. In particular bugs are mostly fixed immediately, and what is left open are feature requests.

2 Likes

Prompted by this thread I had thought to resolve an issue from 2014 :smile:.
See
https://github.com/JuliaData/DataFrames.jl/issues/659
and
https://github.com/JuliaData/DataFrames.jl/pull/2649

5 Likes