The use of Claude Code in SciML repos

YMMV - as I am on AI Pro plan, and just learned* Antigravity quotas do depend on your Google plan.

  • presumably because it was just introduced

I was just curious. Has there been any update since opus 4.5/4.6 released? Do the models perform better now on the more difficult problems? Would be very interesting to track performance over time.

1 Like

Yeah I don’t think the models are substantially better :sweat_smile: but the prompting systems and use have improved. We setup a bunch of preset prompts for common tasks and then Claude uses these and giving it protocols has made it better, but we can see if we remove all of this context the gains go away so it seems to be more of that then the models improving.

I did write a blog post that shows some of the prompts that I use if people were curious about details:

And then there’s a video I did on the SciML bot system which should go live in a week or so… I should check on that.

14 Likes

I’ve pretty much switched to Claude-Code driven development for all SciML work. By that I mean planning out in detail what I want to do iteratively with CC, sometimes having Codex review that plan too, having CC implement it, and then having several rounds of my reviewing and commenting on the code. I’ve basically replaced writing code with planning and reviewing, but I do feel it has increased my productivity and not led to lower quality results than what I’d generate on my own. However, this is quite different than just having the AI vibe code everything and still a lot of time on my part (just in a different role than before, and on a whole less time per task).

Edit: I do think there was a big jump in the models as of the December releases. They became much more competent, perhaps just in using tools but this still made a huge difference in their net abilities to get stuff done.

9 Likes

I am curious about the experience of others about Claude’s (and other AI agents) ā€œunderstandingā€ of numerical issues. To give a recent example, I have spent tracking down an numerical issue that amounted to a

logsubexp(logcdf(F, a), logcdf(F, b))

which should have been a

logdiff(F, a, b)

(using Distributions, LogExpFunctions). The original code was written using Claude, and led to a hard to reproduce numerical issue that I could not convince it to understand (the above is just an MWE, various values were precalculated so the flow was more complicated), so I debugged and fixed it myself.

I am wondering if I am not (yet) good enough at talking to these agents, or if they are not so good about numerical issues. (Or a combination of both :wink:)

1 Like

They aren’t so great at it so it’s one of the things I normally have to mention to it, also using things like exp10 and things like that.

6 Likes

SciML claude skill?

4 Likes