I wonder how this actually works. ChatGPT wrote some pretty amazing Matlab code for me, though with some syntax errors (missing multiplication signs, etc.) which it fixed when I mentioned it. It also used some undefined variables, and when I said they were undefined, it explained what they meant and defined them in the code.
The kicker, though, was when I asked it to re-write the code without relying on the Image Processing Toolbox, which it promptly did, by tweaking the original code (same variable names etc.)
So, obviously, I asked where it found the code, and got the reply that it was from nowhere in particular, but based on the theoretical formulas defining the quantity I asked for.
It’s kind of headspinning, and quite possibly untrue…
IMV the best way to think about these models is by thinking of them as super-advanced keyboard prediction engines. They just chain together statistically likely sequences of words. By adding more words to the sequence as a “chat session” you steer the output. It doesn’t reason about where it found the code, but rather it’s observed that text that has talked about where code came from have often made mention of theoretical formulas and such.
Ah, A16z again. They are almost everywhere. I mean only positives. But hey, do you think that similar results can be achieved also on this thing that is currently being cooked by Mr. @StefanKarpinski and by Mr. @ChrisRackauckas, I mean JuliaHub? Or JuliaHub is a very different thing? Asking seriously. I have been in touch with a few guys from Paperspace before but I have never used JuliaHub nor Anyscale so far. And also, wondering, what could be the next big thing after ChatGPT? I was thinking maybe “quantum reinforcement learning via policy iteration” or maybe some intersection of “reinforcement learning” and: “quantum topological neural networks”, “quantum machine learning with subspace states”, “quantum orthogonal neural networks” or maybe “quantum neural networks based on compound matrices”? Not sure if this is to your interest at all but if we are having this discussion, what would be your take @findmyway, if I may ask of course? Just a pure curiosity.
A LOT. How big do you think the corpus of text is that it’s trained on? How much did it cost just to accumulate that? How much does it cost to curate that corpus? It’s probably a huge datacenter full of drives just for the corpus. And of course with that many drives they fail daily and require someone continuously pulling and replacing them.
Running ChatGPT costs $100k a day. I promise you it used more CPU training than running. Probably trained for months continuously on thousands of GPUs. So, let’s say tens of millions of dollars to train.
My random guess is they slurped every website they could get their hands on similar to a google crawl. So let’s just guess maybe thousands of terabytes of text, a few racks full of storage servers. Not to mention hundreds of days of crawling the web and shoving stuff onto those storage servers.
I have little experience here. I don’t do large language models or anything like that. I just imagine that training a large language model is not very different from running a search engine… slurp text from every source you can find and “index” it except indexing here is more like training a neural net.
Yes, it’s important to understand what has already been achieved. Especially that it seems to look pretty remarkable. My point is, at the same time, it’s important to look into the future in order to pinpoint the thing that could become Julia’s owns edge, as remarkable, if not even more, as the previous one.