Some Julia growth/usage stats

There’s been a couple of posts recently by certain people bemoaning Julia’s lack of popularity/usage/growth. Which I find confusing—where is this impression of lack of usage coming from? It’s certainly not the impression I get at all. From where I’m standing it seems like there’s steady, consistent growth in usage, and that Julia is transitioning from an emerging technology to a fairly mature technology that is widely used, although certainly not the most widely used. So I made a few graphs of various stats that people might find interesting and did a little armchair data analysis that people may or may not find compelling.

Package server stats

These are stats from the package servers that Julia’s Pkg client has connected to by default since Julia v1.5 or so. We don’t track individual end-users, so we can only make inferences, but this is a good proxy for how many people are actually using Julia and can give us a sense of growth if not clear-cut absolute numbers.

Monthly requests

This is how many client requests the Pkg client makes to package servers, with CI requests and other bots filtered out, so it’s only actual end-users.

Monthly request count

Summary: Steady linear growth from 10 million requests/month at the end of 2021 to 20 million requests/month today. Is it exponential growth? No, but that’s not really something I would expect for a programming language except for very early on. There are only so many programmers on the planet and only so many of them are interested in numerical computing. It sure seems like a decent fraction of them must be using Julia if we’re seeing 20 million pkg requests/month.

What does this metric tell us about the number of users? It’s hard to estimate people because we don’t know how many package requests the average user makes per month, but if we guess 200 requests per person per month (average), that’s about 100,000 active users. Who knows if that’s right, but I suspect it’s in the right ballpark. There’s probably ten times that many who use Julia sometimes and 100k who use it a lot.

There’s a couple of big dips in there. I’m not 100% certain about those, but we have had a couple of incidents where package server logs weren’t getting uploaded for a while. (We don’t have anyone whose full-time job is maintaining this infrastructure.) My best guess about those dips is that they happened when we lost server logs.

Monthly unique client IPs

This is how many unique IP addresses the package servers see per month. (Estimated in a way that preserves end-user privacy using HyperLogLog hashing.) I was going to start with this metric, but I suspect it’s misleadingly flat.

Monthly unique client IPs

Summary: ~75k unique IP addresses per month, roughly steady since 2021.

Why do I think this is misleading? Because we know that requests have gone up over that time and we have no reason to believe that the number of pkg server requests each Julia client makes per month would have doubled in that time. What I suspect is happening here is that we’ve basically saturated the number of public IP addresses that you’ll see requests from when running an internet service: ISPs share and reuse IP addresses using a combination of NATs and DHCP, so at some point you’ve seen basically all the IP addresses you’re going to see even if your user base grows.

Discourse stats

Monthly page views

This is how many impressions our Discourse forum has per month. Views need not be from logged in users or even people with accounts. If someone googles some Julia question and the answer is a discourse post and they view it without logging in, that’s counted here.

Summary: This has been steadily growing from 500k/month in 2018 up to 2 million/month today. (I picked that since I think that’s the year we started using Discourse and it’s the year we released Julia 1.0.) Again, growth is linear, not exponential, but like I said, exponential growth isn’t a thing except very early on because there are just not that many people.

What’s with the big dip? Yeah, so we pay Discourse to host our instance and at some point—without telling us—they decided to start blocking web crawler traffic to our instance. As a result, Julia’s Discourse stopped showing up in search engine results and our page views tanked, which is how we noticed. We had to yell at them and get them to stop doing that. Our page views still haven’t entirely recovered, but they’re getting there.

Monthly active users

This is the number of logged in users who visit Discourse every month. This metric only counts users with accounts who are logged in when they visit Discourse.

Summary: Steady growth from 10k/month in 2018 up to 33k/month in 2021-2023, then a massive dip due to web crawler blocking, followed by partial recovery but only back up to 28k/month.

First: why does this plateau from 2021-2023? I suspect that no matter how many users a language has, there are only so many people who will create an account on the language forums and regularly visit them. I’ve used many programming languages quite a lot without ever creating an account on their discussion forums. So I don’t really expect this graph to keep growing—beyond a certain point, I think the new users are less and less likely to engage in that way, and that’s ok. Having users who have never been to your forums is a sign of language maturity and success.

Then there’s the big dip, of course. This is likely caused by the search engine crawler blocking. But why haven’t active user visits not recovered like page views have? I’m not really sure. Maybe logged in users recovers slower—people got logged out and haven’t bothered to log back in again? Or maybe this graph was going to peak and go down again anyway and the dip just showed up at a confusing time.

Why would active forum users peak and then go down if usage is going up? It’s possible that as a language matures there’s less need to visit its forum to get answers to things. There are now many high quality Julia books coming at it from various angles. Again, I’ve used many languages without every creating an account. If my question is already answered somewhere, I’m not going to bother making an account.

There may also be more colleagues who can answer questions. It’s also possible that some of those colleagues have names like ChatGPT and CoPilot—the web crawler dip does happen to coincide with LLMs getting good enough to start to displace sites like StackOverflow and Discourse. We know that StackOverflow visits are down since ChatGPT, maybe we’re seeing the same phenomenon. If so, that makes page views recovering all the more impressive. But this is just speculation.

Conclusions

This data isn’t conclusive, but it certainly suggests steady ongoing growth to me. That’s also the impression I get from talking to people: more and more of them are just using Julia and finding that it just works for them without needing help. And far more people have heard of it and are considering using it than even a couple years ago. I don’t think usage is growing exponentially, and Julia isn’t going to unseat Python or C++ any time, but that’s ok. Frankly, Rust and Go aren’t going to either—and that’s also ok! Julia is quite successful and popular in some areas of numerical computing and is growing steadily in both popularity and quality. Is it growing exponentially like it felt like it was in the heady early days? No. But that’s to be expected and totally fine. Steady, linear growth in popularity and usage along with steady improvements in quality and maturity is a very good place for a language to be at the 10ish year mark.

What should we, as Julia developers, dedicated users, and fans do? Probably stop worrying about these metrics and focus on making things better. In the words of Steve Martin, “Be so good they can’t ignore you.”

107 Likes

Thanks, Stefan.

Have there been any technical changes that could explain the increased Pkg requests, such as the new package extensions or the splitting out of stdlibs?

I wonder what the Discourse numbers look like for other languages.

Each person using more packages on average would do it, which could be happening. Really hard to say without adding more telemetry, which has been highly contentious in the past. I would consider generating persistent HyperLogLog hashes on the client since those are only 17 bits and can’t uniquely identify individual end-users, but I’m not sure if people would be ok with that.

5 Likes

idk if 75k IP Addresses are really “seeing all the IP addresses there is to see” – I know NAT and even double NAT is common in countries like China where per-capita IPv4 addresses reserved were very limited, but I think especially in the US and Europe it’s never been an issue and most residential broadband clients have their unique public IP, not to mention they are dynamic so you should see more than 1 unique IP from a user after a while

I think the # of monthly request to Pkg server can also be misleading, maybe company or individual deployed Julia but forgot to signal they’re bots (I know this is done in CI but I can imagine companies manually deploying Julia may not know).

In fact, the # of active users on Discourse would corroborate with the story “unique IP addresses” is telling – that since the beginning of 2022 there hasn’t been a significant increase in active/pro users in the wild.


Alternatively, maybe the story “# of Pkg server request” is telling is the real one, and there are other explanation to why Discourse visits didn’t go up as much (e.g. Zulip, Slack diverted visits, communities like Makie has its own Discord server etc.), and maybe lack in up tick in unique IP Address is due to new users primarily grew in developing countries or they are clustered in institutional IP addresses (universities, national labs)

5 Likes

does the creating of Pkg Extensions and the general ecosystem trend of splitting things up impact the statistics here? or would they be lumped into one pkg request thus doesn’t matter.

1 Like

Why not just count how many downloads of Julia? Since we know what date it was release. Then we can use the data to plot the exponential growth of Julia.

1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10

You can compare the /about pages, but of course community buyin varies. Some from the top of my head:

16 Likes

There was a long (and inevitably somewhat meandering) thread on Zulip a while ago:

https://julialang.zulipchat.com/#narrow/stream/137791-general/topic/Evidence.20of.20Julia’s.20growth

Some stats that people considered there:

  • New packages/versions in the registry
  • Zulip and Slack usage
  • PyPL rating
  • Julia downloads
  • Julia YouTube channel views
  • JuliaLang.org visitors
  • GitHub stars

We also had a discussion on Slack recently about this ranking of languages by number of GitHub contributors: Programming languages | GitHub Innovation Graph

Maybe someone should just slap together hasjuliahappenedyet.com on a heroku instance or similar, pulling all of these numbers monthly to make these recurring threads simpler…

FWIW I think when you look at everything in the rounds the best conclusion is probably steady linear growth from a still low base relative to other languages. Personally I don’t think it’s not very useful to ruminate too much over this - as long as the ecosystem doesn’t collapse, the language is actively developed, I can register new versions of my packages etc. I don’t care so much for the overall popularity of the language. As Chris says in the linked Zulip thread, the bus factor for much of the stats ecosystem is 0 so if anything there’s been negative growth in my field since I started out around 0.3, but that hasn’t stopped me from doing some excellent work (if I may say so myself…) in Julia over the past 10+ years.

19 Likes

Interesting to note that the Rust landing page calls their Discourse “the Users Forum”. It’s clearly the only place explicitly mentioned. But the Discourse surprisingly has relatively little traffic. The landing page also has an icon for Discord, which seems to have more traffic.

Yep. I didn’t pull any numbers out or draw any top-line conclusions because

1 Like

Looks like Rust has a separate users forum and internals forum.

1 Like

Curious if you have any data on the relative usage between 32-bit and 64-bit Julia?

I think people overthink the usage of Julia. It has a thriving Discourse and Slack forum, it is quite “unified” compared to for example Python with forums all over the place (I’ve never found a good place to ask Python questions, when I was looking into that a few years ago) and I see a lot of the same faces, but also new faces popping in.

As long as the language is developing, people are sticking around and it is moving, not standing still, then all it takes for a major public “buy-in” is as @StefanKarpinski mentioned, a package which is so good for some widely used application that it cannot be ignored.

Of course Julia is not going to be a top language in a commercial sense, since most managers working with software, probably didn’t get to try it when they were learning the basics. If Julia keeps what it is doing now, I can’t imagine that it won’t break through as managers with Julia experience, either direct or in-direct will start wanting to use it. A lot of the time it is more ‘politics’/‘history’ which drives language decisions in my view.

23 Likes

Here are a couple more graphs. These are pkg download stats for just the URL /registries (again, only for client type “user”, excluding CI and bots). This URL is something that would be requested once per Julia session by each client, so it should be unaffected by the number of packages each person uses growing.

registries Monthly total requests

Monthly total requests has a weird spike at the beginning of 2022 that I really don’t know the reason for. Maybe there was some change in the logic of how often per Pkg session the /registries page is requested? I don’t recall that, but it’s hard to think of what else this was. Aside from that and the two dips for lost logs, this looks like steady, not quite linear growth. It’s a bit too noisy to really declare that a clear trend, but it’s going up and to the right in any case.

registries Monthly unique client IPs

Interestingly, monthly unique IPs for the /registries endpoint has a shallow dip in exactly the time period that total requests has a big spike. Again, I don’t really know what happened here. Main guess is some kind of change in client behavior.

It’s a little strange that there are fewer unique IPs requesting /registries than overall at in 2021 but then by 2024 the gap has closed and both graphs are at about 9k. The Pkg client logic is generally to request /registries in a session before doing anything else, so why were there fewer clients requesting /registries back then? I’ll have to dig into it a bit more.


Aside: I’d really like to make aggregate pkg request data accessible to more people to analyze. The queries I ran for this are not derivable from what we currently publish. I don’t really have the time and other people in our community are surely better at this kind of thing than I am. But I’m not fully comfortable making individual request data available without some sufficient level of aggregation. I’ll have to think a bit more about if we can aggregate it in some ways that would allow people to do useful analysis without revealing individual requests.

20 Likes

Some of it is a somewhat inarticulate way to address this:

and this:

First: I’m not a bemoaner. The language is, from my vantage point, maturing, and work is being put into the right places to further improve it. I find it a pleasure to work with, and my use of it isn’t grounded in numerics, which is unusual.

It’s good that it has a niche. It’s clear that scientific/numeric computing was the chief focus in designing Julia, and that community of practice deserves continued attention and success.

Julia is an excellent language for numerics, and, Julia is a good language for general-purpose computing, with the potential for excellence there as well. That’s a much larger audience.

When the general developer hears about a scientific/numeric computing language, they think of R, Matlab, Fortran, and if they bucket Julia into that category, they conclude that it’s a language to use if you have to. No one would suggest writing most things in R, or Matlab, or Fortran. They’re both good at their job (Julia is imho better), and bad for anything else.

Most of what keeps Julia in the “merely good” category for general-purpose programming is that the package ecosystem is, relatively speaking, thin. That’s a chicken-and-egg problem, and it naturally gets better over time. But there’s also a marketing/branding challenge there. From outside the community of scientific/numeric programming, Julia is typecast by association. This is a conversation I’ve had several times at this point, “why would you use Julia, isn’t that for number crunching?”.

There are a few aspects of the language itself which could admit of improvement, especially for programming-in-the-large. But even if that never happened, a Julia which was recognized as a good choice for general-purpose programming, and used that way, would naturally grow the expanded package ecosystem which would make it excellent for that purpose.

If I had good suggestions for what that sort of outreach would look like, I’d make them, but I don’t. I’m hoping to explain the disconnect between the reality you’ve illustrated, which shows steady growth and a respectable user base, and the sort of complaint which prompted it.

To pull a number from the nether regions, fifty times as many people could be using Julia as currently do so. Fifty times as many people should be using Julia as currently do so. So, while I’m not a bemoaner, I would venture that the bemoaners are wondering why that much larger potential user base isn’t flocking to the language.

18 Likes

That’s a path to take. It’d be inspiring to say that if Julia is the most efficient tool on which other tools can be built in the long run, then go for it.

It’s another story when you or your team is to compete against another team, with effectively infinite money, hundreds if not thousands time the programmers, and emerge victorious.

It’s not an easy path. It’s a path of struggle.

Deep down I want to go down that path too, but it’s not easy, even for people like me who can trivially do many things people struggle so much for.

Constantly dwelling on how hard and unfair things are doesn’t really help though, does it? Frankly, it bums everyone out and drains their energy and doesn’t accomplish anything useful. It’s also just not how people who get things done approach the world. Rather than focusing on how hard and unfair things are, pick a thing that you can do something about, and do it. Repeat and make small but measurable progress regularly. You’ll be amazed by how far you get over time. I also just don’t think things are as stacked against us as you seem to be convinced that they are.

46 Likes

Yeah, in my experience, I dont really need exponential growth of julia (even if that’d be great!). Julia’s main value proposition to me is how productive it makes me and how little I need to rely on someone else to make packages tailored to my exact niche.

It’s always great when an expert comes and makes a package that does what I need, but julia has such a solid base for my sort of work that I typically dont need much more than what Base provides.

So long as the language works and people keep plugging away at interesting problems, I think Julia has a bright future.

14 Likes

I found this article about WebAssembly adoption interesting (published yesterday) and I think it could provoke some thought around Julia adoption: https://thenewstack.io/webassembly-adoption-is-slow-and-steady-winning-the-race/

2 Likes

I love this framing. I’m just going to keep using Julia, loving it, and going on with my day. IMO being a passionate user is one of the better things I can do.

11 Likes