How does chatGPT work?

I wonder how this actually works. ChatGPT wrote some pretty amazing Matlab code for me, though with some syntax errors (missing multiplication signs, etc.) which it fixed when I mentioned it. It also used some undefined variables, and when I said they were undefined, it explained what they meant and defined them in the code.

The kicker, though, was when I asked it to re-write the code without relying on the Image Processing Toolbox, which it promptly did, by tweaking the original code (same variable names etc.)

So, obviously, I asked where it found the code, and got the reply that it was from nowhere in particular, but based on the theoretical formulas defining the quantity I asked for.

It’s kind of headspinning, and quite possibly untrue…


It’s true. Same experience I made when playing… still playing around. As I said, quite awesome. Of course one has to stay alert!

IMV the best way to think about these models is by thinking of them as super-advanced keyboard prediction engines. They just chain together statistically likely sequences of words. By adding more words to the sequence as a “chat session” you steer the output. It doesn’t reason about where it found the code, but rather it’s observed that text that has talked about where code came from have often made mention of theoretical formulas and such.

I appreciated this preprint: [2212.03551] Talking About Large Language Models


It can do Julia too! I used it to help me with the initial layout of a small particle simulation. As you mentioned, quite a few bugs, but by talking to it I could get it to fix some stuff for me etc.

At the end would I have been able to program it without it? Probably. Was it more fun having a “companion” along the way? Definely!

Kind regards


If you want to understand the underlying technology, this video explains the concept of Reinforcement Learning from Human Feedback (RLHF) pretty well.


I am wondering, what is the language ChatGPT is written. I guess it is probably not Julia. Would be really cool to have a project with so much publicity and interaction written here.

I heard it was trained with Ray

1 Like

Ah, A16z again. They are almost everywhere. I mean only positives. But hey, do you think that similar results can be achieved also on this thing that is currently being cooked by Mr. @StefanKarpinski and by Mr. @ChrisRackauckas, I mean JuliaHub? Or JuliaHub is a very different thing? Asking seriously. I have been in touch with a few guys from Paperspace before but I have never used JuliaHub nor Anyscale so far. And also, wondering, what could be the next big thing after ChatGPT? I was thinking maybe “quantum reinforcement learning via policy iteration” or maybe some intersection of “reinforcement learning” and: “quantum topological neural networks”, “quantum machine learning with subspace states”, “quantum orthogonal neural networks” or maybe “quantum neural networks based on compound matrices”? Not sure if this is to your interest at all but if we are having this discussion, what would be your take @findmyway, if I may ask of course? Just a pure curiosity.

Also wondering, how large it is? I think, I was reading somewhere that it takes more than one core to provide the answer, but the training? Would you have any estimation?

A LOT. How big do you think the corpus of text is that it’s trained on? How much did it cost just to accumulate that? How much does it cost to curate that corpus? It’s probably a huge datacenter full of drives just for the corpus. And of course with that many drives they fail daily and require someone continuously pulling and replacing them.

Running ChatGPT costs $100k a day. I promise you it used more CPU training than running. Probably trained for months continuously on thousands of GPUs. So, let’s say tens of millions of dollars to train.

1 Like

How big do you think the corpus of text is that it’s trained on?

I have no idea. Never did anything with the language models. Thus I asked the question.

Running ChatGPT costs $100k a day.

With milions of users and the assumption that it takes more than a core to provide the anwer would be quite similar to my estimation, probably scale in line with the number of concurent users …

[…] I promise you it used more CPU training than running. […]

I am recalling it took me about 7 days to train basic AlphaZero model on a premium Ice Lake.

Based on your experience, what would be the next big thing @dlakelan? :- )

My random guess is they slurped every website they could get their hands on similar to a google crawl. So let’s just guess maybe thousands of terabytes of text, a few racks full of storage servers. Not to mention hundreds of days of crawling the web and shoving stuff onto those storage servers.

I have little experience here. I don’t do large language models or anything like that. I just imagine that training a large language model is not very different from running a search engine… slurp text from every source you can find and “index” it except indexing here is more like training a neural net.

Yes, it’s important to understand what has already been achieved. Especially that it seems to look pretty remarkable. My point is, at the same time, it’s important to look into the future in order to pinpoint the thing that could become Julia’s owns edge, as remarkable, if not even more, as the previous one.