Reliability of AI coding tools

Mikhail_Kagalenko · November 17, 2025, 10:25am

It is unfortunate that this package has been developed by an “AI” tool.
The code produced in this fashion is inherently unreliable.

Alex_Tantos · November 17, 2025, 10:32am

Could you please elaborate on what exactly you find “inherently unreliable” in the current codebase? It would be helpful if you could point to specific parts of the implementation where you see issues or risks.
For transparency: the package is not generated entirely by an AI tool. The initial skeleton was drafted with assistance, but the design decisions, logic, and a considerable amount of the code have been written, refactored, and reviewed manually. A few of us are also in the process of integrating a clearer multiple-dispatch design for question types, which should further improve maintainability and clarity.
Constructive, specific feedback is always appreciated. General statements without technical details, however, can easily create a misleading impression.

Mikhail_Kagalenko · November 17, 2025, 11:51am

It is a well-known fact that code and specifications hallucinated by a neural net fail in unpredictable and sometimes catastrophical ways. This discussion is just one example among others, too numerous to mention.

To quote the author of the reddit post

When it comes to Engineering, stick to your fundamentals. Don’t take AI’s information at face value. It can literally kill someone, significantly damage your company’s reputation, significantly hurt your career, etc.

liuyxpp · November 17, 2025, 11:53am

You can be more concrete. An issue or even better a PR is always welcome. Talk is cheap, show we the code!

langestefan · November 17, 2025, 2:58pm

But there are also plenty of examples that it can work. Claude sucks, but it is still useful.

Mikhail_Kagalenko · November 17, 2025, 3:36pm

So claude sucks. It can’t solve any hard problem.
But… people really underestimate the amount of open source maintenance that is not hard problems.
…
And I don’t think it can be used for anything harder.

Here it has been used to design a new package. I think your example rather confirms my point.

langestefan · November 17, 2025, 3:42pm

No I don’t think so, it’s not a particularly complicated project so it’s a good usecase for AI.

Alex_Tantos · November 17, 2025, 3:49pm

In our case, it works beautifully. I’ve always written my own code for data analyses, but there are needs that are better addressed through a dedicated Julia package that I simply don’t have the time to fully develop myself. Over the last three months, I’ve used Claude to help prepare three separate packages (two public and one private). These are packages that most users can benefit from, and they do not carry any “high-risk” implications. On the contrary, they make Julia more attractive and approachable for a wider audience. For example, through TextAssociations.jl, I’ve been able to bring many people from the humanities, especially corpus linguistics, closer to Julia.

I see no reason why anyone should avoid using Claude or any other coding assistant. These tools are extremely valuable. The code they produce can always be reviewed, improved, monitored, and refined. Avoiding -or even worse- dismissing them does not help new users adopt the language for their use cases and it also reflects an older era and mindset.

Alex_Tantos · November 17, 2025, 3:53pm

As for the phrase “anything harder,” I genuinely don’t understand what it is meant to imply. What one person considers “harder” may be entirely unremarkable or irrelevant to someone else. In my opinion, there is nothing harder than modelling the inherent vagueness of natural language. Would you agree or do you even consider the ‘hardness’ of it to matter in the current context?

And I was under the impression that Julia Discourse is a space where developers and users interact constructively. If it were intended only for developers working on highly specialised, niche features of the language, I wouldn’t have a reason to participate here in the first place.

Mikhail_Kagalenko · November 17, 2025, 4:16pm

That question is answered in the very beginning of the blog post referenced by @langestefan:

Claude can only solve simple problems that a first year undergrad can do, it can’t do anything more, it’s pretty bad.

langestefan · November 17, 2025, 4:23pm

This is referring to numerical / scientific problems, not to coding in general.

Mikhail_Kagalenko · November 17, 2025, 7:37pm

Doesn’t sound like it to me:

For people who can use it for more, it’s probably some standard Javascript or Android app that is the 20,000th version of the same thing, and yes it probably is copying code.

langestefan · November 17, 2025, 7:53pm

Swirl.jl is the Julia version of swirl for R. So yeah it’s pretty accurate. I’d be curious if you have any concrete points of improvement. If you just hate AI on principle then this discussion isn’t very productive and then I will stop responding

liuyxpp · November 18, 2025, 2:00am

And even this claim is not true. I have used GPT5 to solve a very nasty bug of a graph isomophism problem and it works like a charm. The LLM now “understand” quite deeper about numerical problems more than anyone can think of. If you use it corrrectly (I mean give it enough guidance), it will make anything human can do possible.

Mikhail_Kagalenko · November 18, 2025, 2:45pm

I don’t think that the dispute regarding the value of “AI” code assistants is as settled as you claim, and dismissing contrary opinions by ad-hominem insinuations does not help your case much.

Alex_Tantos · November 18, 2025, 3:01pm

Thanks for the clarification. And just to be very clear from my side, there was absolutely no ad-hominem intent. My comment was about the general practice of discouraging or dismissing AI-assisted scaffolding, not about you personally.
I also agree that the broader debate about AI-assisted coding is far from settled and I mean that in the sense of improving these tools and their outputs, not in the sense of dismissing AI coding agents altogether. There is room for progress, but progress requires engagement, not blanket rejection.
Precisely for that reason, what actually helps is specific, actionable critique. But without such concrete issues, there is no meaningful basis for further discussion; we’re just debating abstractions. I’m fully open to detailed critique, but I won’t continue the discussion unless there is an actual, substantive point to address.

Mikhail_Kagalenko · November 18, 2025, 3:46pm

So it was an ad-hominem directed at everyone opposed to the use of “AI” tools, rather than just me? That’s good to know.

A number of highly valued open-source projects ban “AI” coding altogether, and the debate is about wider adoption of this practice, rather than about “improving” code assistants.

langestefan · November 18, 2025, 4:16pm

Which ones have banned AI code?

Mikhail_Kagalenko · November 18, 2025, 4:40pm

Gentoo, NetBSD, QEMU

langestefan · November 18, 2025, 4:52pm

That’s all? Meh, I expect the linux guys to be a bit more cautious. QEMU concerns seem to be only about legal aspects. Given that practically all devs are already using AI tools to assist in coding I doubt they have the means to enforce any of the bans.

Topic		Replies	Views
Are there efforts to improve ChatGPT for Julia code? Tooling	26	4063	July 24, 2023
Julia losing popularity among Data Science users (KDnuggets Software Poll) Community	146	20332	June 23, 2018
The State of the Julia Ecosystem Community	109	8618	January 31, 2019
Couldn't be good to postpone 1.0? Internals & Design	46	4908	December 23, 2017
Ethics in Julia Community	52	4397	August 5, 2020

Reliability of AI coding tools

Related topics