Reliability of AI coding tools

It is unfortunate that this package has been developed by an “AI” tool.
The code produced in this fashion is inherently unreliable.

Could you please elaborate on what exactly you find “inherently unreliable” in the current codebase? It would be helpful if you could point to specific parts of the implementation where you see issues or risks.
For transparency: the package is not generated entirely by an AI tool. The initial skeleton was drafted with assistance, but the design decisions, logic, and a considerable amount of the code have been written, refactored, and reviewed manually. A few of us are also in the process of integrating a clearer multiple-dispatch design for question types, which should further improve maintainability and clarity.
Constructive, specific feedback is always appreciated. General statements without technical details, however, can easily create a misleading impression.

8 Likes

It is a well-known fact that code and specifications hallucinated by a neural net fail in unpredictable and sometimes catastrophical ways. This discussion is just one example among others, too numerous to mention.

To quote the author of the reddit post

When it comes to Engineering, stick to your fundamentals. Don’t take AI’s information at face value. It can literally kill someone, significantly damage your company’s reputation, significantly hurt your career, etc.

You can be more concrete. An issue or even better a PR is always welcome. Talk is cheap, show we the code!

5 Likes

But there are also plenty of examples that it can work. Claude sucks, but it is still useful.

So claude sucks. It can’t solve any hard problem.
But… people really underestimate the amount of open source maintenance that is not hard problems.

And I don’t think it can be used for anything harder.

Here it has been used to design a new package. I think your example rather confirms my point.

No I don’t think so, it’s not a particularly complicated project so it’s a good usecase for AI.

2 Likes

In our case, it works beautifully. I’ve always written my own code for data analyses, but there are needs that are better addressed through a dedicated Julia package that I simply don’t have the time to fully develop myself. Over the last three months, I’ve used Claude to help prepare three separate packages (two public and one private). These are packages that most users can benefit from, and they do not carry any “high-risk” implications. On the contrary, they make Julia more attractive and approachable for a wider audience. For example, through TextAssociations.jl, I’ve been able to bring many people from the humanities, especially corpus linguistics, closer to Julia.

I see no reason why anyone should avoid using Claude or any other coding assistant. These tools are extremely valuable. The code they produce can always be reviewed, improved, monitored, and refined. Avoiding -or even worse- dismissing them does not help new users adopt the language for their use cases and it also reflects an older era and mindset.

4 Likes

As for the phrase “anything harder,” I genuinely don’t understand what it is meant to imply. What one person considers “harder” may be entirely unremarkable or irrelevant to someone else. In my opinion, there is nothing harder than modelling the inherent vagueness of natural language. Would you agree or do you even consider the ‘hardness’ of it to matter in the current context?

And I was under the impression that Julia Discourse is a space where developers and users interact constructively. If it were intended only for developers working on highly specialised, niche features of the language, I wouldn’t have a reason to participate here in the first place.

5 Likes

That question is answered in the very beginning of the blog post referenced by @langestefan:

Claude can only solve simple problems that a first year undergrad can do, it can’t do anything more, it’s pretty bad.

This is referring to numerical / scientific problems, not to coding in general.

Doesn’t sound like it to me:

For people who can use it for more, it’s probably some standard Javascript or Android app that is the 20,000th version of the same thing, and yes it probably is copying code.

Swirl.jl is the Julia version of swirl for R. So yeah it’s pretty accurate. I’d be curious if you have any concrete points of improvement. If you just hate AI on principle then this discussion isn’t very productive and then I will stop responding :slight_smile:

3 Likes

And even this claim is not true. I have used GPT5 to solve a very nasty bug of a graph isomophism problem and it works like a charm. The LLM now “understand” quite deeper about numerical problems more than anyone can think of. If you use it corrrectly (I mean give it enough guidance), it will make anything human can do possible.

2 Likes

I don’t think that the dispute regarding the value of “AI” code assistants is as settled as you claim, and dismissing contrary opinions by ad-hominem insinuations does not help your case much.

Thanks for the clarification. And just to be very clear from my side, there was absolutely no ad-hominem intent. My comment was about the general practice of discouraging or dismissing AI-assisted scaffolding, not about you personally.
I also agree that the broader debate about AI-assisted coding is far from settled and I mean that in the sense of improving these tools and their outputs, not in the sense of dismissing AI coding agents altogether. There is room for progress, but progress requires engagement, not blanket rejection.
Precisely for that reason, what actually helps is specific, actionable critique. But without such concrete issues, there is no meaningful basis for further discussion; we’re just debating abstractions. I’m fully open to detailed critique, but I won’t continue the discussion unless there is an actual, substantive point to address.

4 Likes

So it was an ad-hominem directed at everyone opposed to the use of “AI” tools, rather than just me? That’s good to know.

A number of highly valued open-source projects ban “AI” coding altogether, and the debate is about wider adoption of this practice, rather than about “improving” code assistants.

Which ones have banned AI code?

Gentoo, NetBSD, QEMU

That’s all? Meh, I expect the linux guys to be a bit more cautious. QEMU concerns seem to be only about legal aspects. Given that practically all devs are already using AI tools to assist in coding I doubt they have the means to enforce any of the bans.