AI bubble: time to panic? Perhaps not yet... maybe now

Is that a typo? Or how does that work?

not a typo.For a simple example, consider the LLAMA 2 which has ~half as many params as GPT3 (published 3 years earlier), but slightly better results.The models that are currently SOTA will in 3 years be outclassed by the smaller-cheaper versions of the SOTA models 3 years from now. While SOTA flops goes up with time, Fixed accuracy flops go down with time.

If a better architecture gives the same answer quality for less flops, isn’t that flop per answer quality decreasing and answer quality per flop increasing?

1 Like

yes. Division is hard.


Out of curiosity: Have you tried asking GitHub copilot similar questions?
What about Google Gemini? Is chatGPT distinctly better?

I’ve been using Copilot for years now, since the early beta. As an OSS developer, I get it for free from Microsoft. I guess this can apply to quite a few package contributors in the Julia ecosystem.

Copilot can be quite impressive sometimes when generating boilerplate code (think you putting the function name, and it fills up the body). The quality of the codegen depends on the availability of the training data, so Julia codegen is not as good as Python or Javascript, but it’s still quite good. Knowledge of recent Julia packages can be an issue, due to the knowledge cutoffs. With Genie apps, it used to hallucinate badly about a year ago, but now it’s much better, and RAG can go a long way.

Not sure how LLMs could disappear. As a kid I was very puzzled by the story of Pandora’s Box. I couldn’t understand what’s so hard not to open it. Now I get it… Suffice to say that the box has been opened and LLMs are not going back in. LLMs go beyond hosted APIs. It’s super simple to run Ollama or Llamafiles locally and set up your own LLM. Even if OpenAI proves unsustainable, existing OSS models are already extremely powerful. Also, fine tuning these models is pretty trivial and quite cheap these days so one can refine them “forever”. So, IMO adopting LLMs as productivity tools is a sure bet. They’re here to stay.

Bubble? Possibly, if you look at product launches like “Devin”. But even if we fall short of that level of AI (and I think we will), this generation of models was a pretty good leap. One needs to look at the history of AI to see that it has been a cycle of boom and bust (AI winters). But overall great progress was made. Panic? Can’t see why.


I don’t know. But if I was heavily invested in AI, I would be getting antsy. As an example, I know of several universities who instituted AI-related programs (BSc and MSc). Should an AI winter come, there will be some pain. Another example: If I have a cushy AI job, I would start looking for another job just about now.


If it is as easy and cheap to fine-tune, I would love to have as a coding buddy a fine-tuned-for-Julia LLM with all the officially sanctioned library and package source codes and manuals so that it gets freshly fine-tuned with any new Julia release uptick.


Winter is NOT coming.

Yes, it’s entirely possible that there’s a bubble that’s going to burst, a number of high profile companies might drown and some investors will lose a lot of money.

It doesn’t change the fact that there’s a ton of companies without hype-level investments using low-resource (both training and inference) AI to solve problems and develop products that just weren’t possible 10-15 years ago.

And don’t believe the claims that the tooling will go away. Sure, tools like PyTorch and TensorFlow could plausibly stagnate if the funding dries up but there’s still lots that can be done with them in the current state and no shortage of alternative tools that don’t depend on high levels of funding.

The current advances in AI are here to stay the same way that internet did.


There is a feature on JuliaHub called AskAI (built on ChatGPT) that seems to be a Julia-focused code assistant. I have not really tried it out though.


Famous last words…

Another aspect of the shaky foundations of AI is the theft of copyrighted content. Yet another reason not to be optimistic about the winter turning out to be just one cloudy day…
Generative AI is a marvel. Is it also built on theft? (

But are they really? You can run very good models already on your local GPU, and that’s with e.g. 4-bit quantization (already mainstream in open-source models), which is already outdated, floats no longer needed for the weights, you can go to 2-bit or less, that is coming, radically simplifying hardware and lowering energy use for running/inference and for training. That’s practical for 3B+ models (at least Transformers), so basically all mainstream models until recently (except maybe on mobile phones, and they can even run some models).

The Chinchilla scaling laws are outdated, only takes training cost into account, not inference, and according to a later paper, if you take running, i.e. inference, also into account, then smaller models turn out to be more the cost-effective-models. [Also smaller AND better models have already be done for Julia, see below.] The paper on such cost-optimal LLMs however (probably) assumes Transformers (and while mainstrain still, on the way out). Since both papers and Transformers being replaced as we speak, it will be very interesting to see an updated paper, at least running (and training) is getting way less expensive, i.e. tokens/per sc going up with better algorithms (not just because of better hardware like Grok).

The cost to train GPT4 was more than $100 million CEO “Sam Altman stated” (likely not just for the compute, it though seems over 70% of that cost). By now you can train a good model (not just fine-tune) for under [likely, way less actually] $0.1 million according to MIT-IBM Watson AI Lab (and other people, e.g. from Princeton), lead author on the paper (probably only the compute cost, but likely also way too high, since better more efficient models are already out), so it seems this AI report not keeping track with latest, less costly, developments:

3. Frontier models get way more expensive.
[…] while Google’s Gemini Ultra cost $191 million for compute […]
7. The data is in: AI makes workers more productive and leads to higher quality work. […]

Public Sentiment Dips Negative […]

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

This report introduces JetMoE-8B, a new LLM trained with less than $0.1 million, using 1.25T tokens from carefully mixed open-source corpora and 30,000 H100 GPU hours. Despite its low cost, the JetMoE-8B demonstrates impressive performance, with JetMoE-8B outperforming the Llama2-7B model and JetMoE-8B-Chat surpassing the Llama2-13B-Chat model. These results suggest that LLM training can be much more cost-effective than generally thought.

And the cost is going to plummet further then the already 1000x, seemingly already has.

Transformers are on the way out, too costly, fully or largely in some of the hybrid models I’m exited about, based on Mamba:

we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length.

The researchers from Microsoft introduced SiMBA, a new architecture that introduces Einstein FFT (EinFFT) for channel modeling.

I don’t have cost for these or this one (I just found this one, days old):

One of the most impressive achievements of **Zamba-7B i**s its remarkable training efficiency. The model was developed by a team of just seven researchers over a period of 30 days, using 128 H100 GPUs. The team trained the model on approximately 1 trillion tokens extracted from open web datasets. The training process involved two phases, beginning with lower-quality web data and then transitioning to higher-quality datasets. This strategy not only enhances the model’s performance but also reduces overall computational demands.

In comparative benchmarks, Zamba-7B performs better than LLaMA-2 7B and OLMo-7B. It achieves near-parity with larger models like Mistral-7B and Gemma-7B while using fewer data tokens, demonstrating its design efficacy.

30 x 128 = 3840 GPU hours? for 30000/3840*1000 = 7812x cheaper to train than Llama2?

Out today, claimed very good:

This one might also be interesting for Julia:

Vezora/Mistral-22B-v0.1 · Hugging Face WizardLM 2 mistralai/Mixtral-8x22B-Instruct-v0.1 · Hugging Face alpindale/WizardLM-2-8x22B · Hugging Face mistral-community/Mixtral-8x22B-v0.1 · Hugging Face

Code generation benchmarks: HumanEval, MBPP, BabelCode (C++, C#, Go, Java, JavaScript, Kotlin, Python, Rust)

Julia isn’t named there, but Julia is part of Googe’s BabelCode at least and is very plausibly also part of Google’s Codegemma’s training data, to at least a useful degree:

The paper on BabelCode:
Measuring The Impact Of Programming Language Distribution

Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language
We replace Python-specific terms with their equivalent names in the target language. For tasks formulated as code-completion, we support formatting the problem description as a native docstring […] For example, in Julia, the ”$” in docstrings will raise errors if not properly escaped. Thus, we implement methods to automatically handle such cases and ensure correctness. […]
We further consider the bottom 7 languages to be low-resource(LR): Dart, Lua, Rust, C#, R, Julia, and Haskell […]
We also observe that performance on languages do not scale with respect to their resource level nor the model’s size. C#, Dart, Julia, and Haskell have significantly higher gains when scaling to 4B model size when compared to the other languages. While this may be due to the increased number of training tokens, it is not consistent across all LR language

Julia has the highest gain, seemingly, despite being second lowest-resource (HS=Haskell is lowest) language 0.03% of training date vs 36.95% for Java and 16.80% for Python. See (Table 4 and 5, Python missing there?! and) Figure 6. Mean relative difference of pass@k for each of the models trained on the different Unimax distributions compared to the pass@k of the same sized model trained on the Natural distribution […]

Julia gains by having a larger model up to a point at least, i.e. to 4B-sized, then 72 questions passed vs. only 5 for 8B (also improvement on the 62B model), but that seems to be not because of only larger model, but older PaLM and PaLM Coder, are those larger models. See table 14. I count that as a 14.4x improvement (though not the best metric(?), not what they highlight with other tables), with half-size model… unlike for most other languages, though R and Haskell also with good gains.


How is that an example? Anyway, I am sure the people working at Nvidia, OpenAI etc are dying to get out of there :rofl:.


Not Nvidia (which is the only company making money so far), but I think OpenAI will feel the pinch once Microsoft starts hating the negative cash flow. (Unless Altman manages to prostitute his company to the petrostates.)

From the point of view of Nvidia, the whole enterprise may be really referred to as BI instead of AI, as from their viewpoint it certainly is “business intelligence”.

Trying to make a short-term profit now would be among the most stupid a company doing AI research could do. Anyway, we can come back here in say one year and see how OpenAI is doing.


There sure is something in that.

I don’t know how this wave of LLM will end. But I have been using ChatGPT for writing LaTeX. It has been tremendously helpful. Now I aren’t afraid of making sophisticated pictures directly with Tikz. Before I always draw them in other languages and import them.


Yeah, this is the big one. New tech is always a loss leader – the drive is towards cheaper and more optimized. Take groq for example – they make specialized hardware (basically for language models) that makes eval much quicker and cheaper. The standard GPUs in use now are essentially very expensive general purpose hardware, which you really don’t need for most transformer architectures. I would not be surprised if costs do not plummet in the next few years.

In general I don’t really think that bubble speculation is usually worth people’s time and energy. Are you interested in AI shit? Cool. Do you want to see what happens to it? Also fine. Do you want to invest or put money it it? Up to you.

I personally think this is sticky. The way I approach my work is fundamentally different now, and in a huge way. I simply do not write code the same way. I’m a Cursor user (and a copilot user), and here’s a few things I notice about myself when writing code:

  1. I notice immediately when the language server is down and autocomplete is not available. It feels like my editor is broken.
  2. I have a much broader approach to writing software. I tend to think more in terms of architecture than in specific files/functions/implementations, in large part because I can ask for a model to write an entire file for me to debug, or autocomplete will simply add a function I wanted.
  3. My commenting game is amazing now. I write extremely informative comments now because it is part of me telling my editor how to help me out.

Code generation alone is huge.

I think it’s weird to me that people have any ability to think of language models as “just a bubble”. Cory’s article was in this vein:

But no one is asking, “What will we do if” – when – “the AI bubble pops and most of this stuff disappears overnight?”

That just strikes me as kind of ignorant, event if you restrict AI to only code generation purposes, where it has an obvious, demonstrable use.

For general purpose language, the problem is harder and slower. People still don’t really know what it looks like. Support chatbots? Probably a long ways from being worthwhile. AI assistants? Kind of a pain in the ass to work with, slow, and it is irritating when they even slightly do not understand what a human would. That kind of thing will take time.

But it’s not nothing. This is a serious and incredible advance in human technology. One of the key differentiators of humans from basically all other forms of life is our ability to communicate through language – it is how we went from primates in trees with sticks to well-fed people sitting in front of computers (!).

Right now it’s not all amazing, but it has a lot of promise to go that way. If you’re scared of the bubble, don’t buy in. I’m aware that bubbles tend to rise on top of fundamentally good technology and that bubbles are not purely irrational. But I think it’ll be interesting to see what happens and I hope everyone manages to be well throughout the coming years.


I’d be interested to find out more. Any suggestions? (Ta.)