YMMV - as I am on AI Pro plan, and just learned* Antigravity quotas do depend on your Google plan.
- presumably because it was just introduced
YMMV - as I am on AI Pro plan, and just learned* Antigravity quotas do depend on your Google plan.
I was just curious. Has there been any update since opus 4.5/4.6 released? Do the models perform better now on the more difficult problems? Would be very interesting to track performance over time.
Yeah I donāt think the models are substantially better
but the prompting systems and use have improved. We setup a bunch of preset prompts for common tasks and then Claude uses these and giving it protocols has made it better, but we can see if we remove all of this context the gains go away so it seems to be more of that then the models improving.
I did write a blog post that shows some of the prompts that I use if people were curious about details:
And then thereās a video I did on the SciML bot system which should go live in a week or so⦠I should check on that.
Iāve pretty much switched to Claude-Code driven development for all SciML work. By that I mean planning out in detail what I want to do iteratively with CC, sometimes having Codex review that plan too, having CC implement it, and then having several rounds of my reviewing and commenting on the code. Iāve basically replaced writing code with planning and reviewing, but I do feel it has increased my productivity and not led to lower quality results than what Iād generate on my own. However, this is quite different than just having the AI vibe code everything and still a lot of time on my part (just in a different role than before, and on a whole less time per task).
Edit: I do think there was a big jump in the models as of the December releases. They became much more competent, perhaps just in using tools but this still made a huge difference in their net abilities to get stuff done.
I am curious about the experience of others about Claudeās (and other AI agents) āunderstandingā of numerical issues. To give a recent example, I have spent tracking down an numerical issue that amounted to a
logsubexp(logcdf(F, a), logcdf(F, b))
which should have been a
logdiff(F, a, b)
(using Distributions, LogExpFunctions). The original code was written using Claude, and led to a hard to reproduce numerical issue that I could not convince it to understand (the above is just an MWE, various values were precalculated so the flow was more complicated), so I debugged and fixed it myself.
I am wondering if I am not (yet) good enough at talking to these agents, or if they are not so good about numerical issues. (Or a combination of both
)
They arenāt so great at it so itās one of the things I normally have to mention to it, also using things like exp10 and things like that.
SciML claude skill?