AI tools to write (Julia) code (best/worse experience), e.g. ChatGPT, GPT 3.5

I noticed on Reddit ChatGPT works for Julia (but I’m interested to write all kinds of code, not just AI/Flux code):

What other tools work for Julia (or other languages)? Which is best? Codex is older, I believe ChatGPT might be better, though it’s not only for programming or Julia. See the example here where it explains non-Julia code:

I believe such tools will be very important in the future, if not already. People will not just choose a programming language based on its inherent merits, rather tool support, at least AI related. What’s your best (or worst) experience with such tools, even for other languages such as Python? I find it very likely it’s currently better supported or other mainstream languages, but I see no objection to Julia also working well, maybe better. It’s just a matter of training data.

Python, JavaScript and other languages of course have huge package systems to learn from.

So far, larger training data has helped large language models (LLMs), for general language, and I suppose programming languages too. But the latest research shows quality over quanity is better for training data, and state-of-the-art LLMs can be trained with one GPU (hopefully translating well to Julia ecosystem, and e.g. JavaScript’s large not the advantage you might think it is, not sure quality there an issue):

Cicero is also an interesting new SOTA AI, for Diplomacy game, i.e. because of the human language component of that game. Yannic also has a video on it, I like his videos and analysis. I regularly watch him, not as much others but open to good suggestions, if others have better analysis.


I tried ChatGPT, and I have mixed impressions. When ChatGPT is right --in an encyclopedic type of information-- it is incredibly accurate and fluid. However, in “full certainty,” ChatGPT also makes mistakes, but because of the “authoritative” style, it might get hard to uncover them.

To answer your question, in the hands of an expert, I believe ChatGPT can be helpful most of the time but also unpredictable: does it take more time to write something from scratch or debug somebody else’s code?

Here is an example dialog that I had yesterday at the ChatGPT Playground. I selected a theme that anybody can follow and appreciate its nuances.

Human: What is the theorem that every even natural number is the sum of two prime numbers?
AI: The theorem you are looking for is called the Goldbach Conjecture. It states that every even natural number greater than 2 can be expressed as the sum of two prime numbers.
Human: How is it proven?
AI: The Goldbach Conjecture has yet to be proven, though it has been tested by mathematicians over the centuries and there is strong evidence that it is indeed true.
Human: List some examples please.
AI: Examples of the Goldbach Conjecture include: 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5, 10 = 3 + 7, 12 = 5 + 7, 14 = 3 + 11 and 16 = 3 + 13.
Human: How do I find the two prime numbers that sum up to 100?
AI: The two prime numbers that sum up to 100 are 97 + 3, 89 + 11, 83 + 17, 79 + 21, 73 + 27, 71 + 29, 67 + 33, 61 + 39 and 59 + 41.
Human: How did you do that?
AI: I used a technique called trial division to find all possible combinations of two prime numbers that sum up to 100. I then filtered out the combinations that do not add up to 100, leaving only the valid solutions.
Human: What is the complexity to find them for an arbitrary natural number n?
AI: The complexity of finding the prime numbers that sum up to an arbitrary natural number n is O(\sqrt{n}). This means that the time required to find the prime numbers that sum up to n increases as the square root of n grows.
Human: Please provide a function in Julia that does this.
AI: Sure, here is a function written in Julia that will find the prime numbers that sum up to an arbitrary natural number n:

function find_primes(n) 
   primes = Array{Int64,1}()
   for i in 2:sqrt(n) 
       if isprime(i) && isprime(n - i) 
           push!(primes, (i, n - i)) 
   return primes

Besides the problematic push!, the code above appears correct. But it has a severe bug, the loop bound is wrong!

Human: What is the complexity of this function?
AI: The complexity of this function is O(\sqrt{n}). This means that the time required to run this function increases as the square root of n grows.
Human: What about the complexity of isprime?
AI: The complexity of the isprime function is O(n). This means that the time required to run this function increases linearly with respect to the input number.
Human: What is the \sum_{i=2}^{\sqrt{n}} i?
AI: The sum of all the numbers from 2 to \sqrt{n} is given by

\sum_{i=2}^\sqrt{n} i = \frac{(\sqrt{n} - 1)(\sqrt{n} + 2)}{2}.

So assuming the loop bound is correct, the complexity is wrong, because it ignores the cost of isprime, even tough it is capable to do the symbolic summation (for the wrong bound) correctly.

I will close with a very interesting note. I run this yesterday.

Human: What is the fifth root of 32?
AI: The fifth root of 32 is approximately 1.717.
Human: What is 2 raised to the fifth power?
AI: 2 raised to the fifth power is 32.
Human: What is the fifth root of 32?
AI: The fifth root of 32 is approximately 1.717.
Human: Are you sure?
AI: Yes, I am sure. The fifth root of 32 is approximately 1.717.

But today it is correct!


I’m using Tabnine as a coding assistant at work. It works very well on Python, JavaScript, and Rust, but probably due to the lack of training data, Julia suggestions aren’t great.

I wonder if we could take inspiration from the code clippy project to train a model focused on Julia. It would also be great if whatever extension interfaces between the IDE and the model could understand the import system, so you could get richer suggestions base local files that have been included at the top of a module.


Thanks for sharing your findings from this very interesting experiment! The drawing here is that there is no real connection between the meaning of “complexity” and “loop bound”, which is not surprising when you think about it. However, it’s important to remember that successful task completion does not necessarily imply the acquisition of meaning.

It’s interesting that you mention the Goldbach conjecture, which is also used as an example in a well-known formal linguistic theory to show that native speakers don’t always have a deep understanding of what they’re talking about. But they can still use surface cues to make connections between meanings.
So, in the following example:

  1. There are some unsolvable problems in number theory.
  2. Every even number greater than two is expressible as the sum of two primes is
    undecidable, for instance.

To be able to make the connections between the two sentences, you don’t necessarily need to know the full meaning and the consequences of the first sentence. However, you “know” how the two are related in a superficial way. This is not the same process that a language model follows, however, since humans still rely on more complex reasoning to selectively interpret what they hear. However, the use of surface cues through “learning” statistical distributions can still lead to accurate predictions and responses or else can get you away with parroting.


From an instructor’s point of view, the biggest problem I have encountered with having students use GPT to generate Julia code is that GPT fails at basic tasks involving packages because the packages evolve so fast and (it seems) CHATGPT is trained on examples and information (e.g. syntax) from years ago.

For instance, JuMP syntax has totally changed, and CHATGPT generates code which would have worked with old syntax. The structure of the code is fine but the usage is all wrong. This also happens with anything involving downloading of data. I wonder if there is any way to allow specially designated individuals (known experts) to give trusted corrections to Julia syntax for CHATGPT?


@logankilpatrick said something about this but I can’t remember what it was