What steps should the Julia community take to bring Julia to the next level of popularity?

Companies fund Python with money and engineers’ time.

As part of our $150K financial sponsorship of the PSF, we will be focusing our funds to the Packaging Working Group to help with development costs for further improvements to PyPI and the packaging ecosystem.

Microsoft also has several developers across the company contributing to the development of the Python language. We have at the time of writing, 5 core developers who contribute part time to the development of CPython

If PEP 703 is accepted, Meta can commit to support in the form of three engineer-years (from engineers experienced working in CPython internals) between the acceptance of PEP 703 and the end of 2025, to collaborate with the core dev team on landing the PEP 703 implementation smoothly in CPython and on ongoing improvements to the compatibility and performance of nogil CPython.

3 Likes

Thanks for the list. I do agree with some of them, like @view, but not all. The worst part isn’t just the name spacing, but array construction syntax where you need to wrap [] in np.array.

Here I disagree completely, a[i, j] is far nicer, and numpy’s conflation of matrix and vector-of-vector is terrible.

I actually like linear indexing.

This is just OOP syntax, not really anything to do with numpy. Also, suddenly you get a function that is not a method, like len (I think), and then this pattern breaks down in an inconsistent way.

This is nice, but rarely used.

Safety or not, what I see here mostly is the lack of nice concatenation syntax: [[1,2,3]; 20]. That’s what I meant by “missing syntax support”.

Not sure if I agree or not on the reduction issue, but the indexing dimensionality in Julia is very pleasing.

8 Likes

I also agree with some points, but others may reflect the closer familiarity with numpy and not its actual advantages.

Very minor, but if anything — here Julia really looks cleaner and more consistent.

You mention EllipsisNotation as an alternative to the (rarely used) ellipsis indexing in numpy, but don’t mention solutions for piping here. Function piping, as implemented in Julia packages, is applicable to all functions in a uniform way — unlike python with a.max() but np.fft.fft(a).

In Julia, you can apply arbitrary reduction to arbitrary dimensions in exactly the same way, like sum.(eachslices(...)). Unlike numpy’s apply_over/apply_along, it’s also efficient.

3 Likes

The opinions on each of the listed items may differ, but as the topic question “What steps?..” is concerned - definitely changing Julia syntax is not an answer here, at least in the most cases as it would be breaking.

1 Like

For me it’s the missing strong AD engine. And a lot of people in science request differentiable forward models these days.
Yes, Zygote works and there is also Enzyme, ForwardDiff, etc. But combining those is not easy.

As a user, it’s much easier just using PyTorch or JAX. Everything is differentiable and works.
Performance is one thing but quite often my Julia code is not differentiable at all or causes some obscure errors in combination with CUDA. I don’t like the general usage of Python + PyTorch but for prototyping I feel it is more convenient.

So happy to see progress happening in Diffractor.jl

8 Likes

I was one of the people in charge of Meta’s investment in Python :slight_smile:

This is why I’m so confused by your implicit theory of language development – I don’t see why you believe corporate funding is a foundational cause of language success rather than an effect.

It’s also why I asked a question you didn’t respond to – which specific corporate funding teams have you personally worked with in your professional life? I don’t mean that question to browbeat you, but rather to nudge you to introspect – what specific sources of data are you using to build your theory of language development?

11 Likes

I agree with you: getting corporate funding is the result of a project’s success, not the cause. Companies sponsor projects that help them solve their problems.

While I think the features of a programming language partly mark its success, I think the ecosystem and tools are increasingly important, and corporate funding can help with that. For example, the best LSPs are funded by companies, and a poor LSP can harm the popularity of a programming language (Rust vs Zig).

1 Like

I guess there are examples of languages where it went the other way around like Go, but in any case it doesn’t seem a helpful perspective from which to consider the issue of this thread (unless one wants to be fatalistic) - starting a petition for large corporate XYZ to invest heavily into Julia is probably not a great step to bring Julia to the next level of popularity.

Given that this discussion has been going for a while with everyone throwing in their pet theory of what’s really going to draw people in, I’ll share my own hobby horse: synthetic control models are an increasingly popular methodology in causal inference, and on a few courses and conferences recently I was surprised by the number of data scientists at big corporations using them in their day-to-day work, working with a plethora of R, Python and Stata packages none of which in my view are amazing.

I don’t think it’d be too hard to get a good package off the ground that’s more ergonomic than what’s currently out there, implements a wider range of models (currently many packages are written by authors of different synthetic control methodologies and therefore only implement one method) and beats others in terms of runtimes.

I’ve made a start here Issues · nilshg/SynthControl.jl · GitHub but don’t have much time to move forward (although an upcoming project might give me a few weeks part time), so if you really want Julia to be the next Python consider contributing!

7 Likes

I agree with you broadly. A critical mass of “Killer package for new cutting-edge tool”, or “Killer package that finally does new-ish tool correctly” seems like an effective route.

Thinking of recent econometrics packages I’ve used, though, some of them benefit from the broader python/R ML ecosystem. ZAM Influence uses torch bindings to do autodiff. GenericML uses a variety of learners from mlr. Though it looks like did is all in R.

But a takeaway is that their reliance on the broader ML ecosystem is super basic. Autodiff is something Julia will soon be a leader in, if it isn’t already. The learners in GenericML are also super basic.

But if some applied economist is like “crap, I gotta download Julia to use this new tool to respond to a referee comment”, that’s a win. But currently those new tools are in R.

3 Likes

Yes there’s certainly fancier stuff out there, but what solidified my opinion on this recently (look at the age of my SynthControl package, obviously I’ve believed in this for a while) was a post on Scott Cunningham’s substack that had a link to a compendium of R, Stata and Julia packages implementing “new” DiD/SC causal inference methods:

The ecosystems are quite fragmented, and I feel something along the lines of what I’ve been trying to do with TreatmentPanels where you basically have a table with meta information as to who was treated and when encoded in the type so that you can write estimators that take e.g. a BalancedPanel{MultiUnitSimultaneousTreatment, ContinuousTreatment} and then find all the models that can estimate causal effects for that treatment type would be quite powerful.

I haven’t really looked into the DiD stuff (althoguh I’ve implemented a starter for ten on synthetic DiD as you see in the readme, pretty untested thought) as I thought the comparative advantage of Stata in those more “classical” regression model type approaches was larger but there are some efforts like https://github.com/JuliaDiffinDiffs/DiffinDiffs.jl/tree/master/lib/InteractionWeightedDIDs in that direction as well.

2 Likes

Excellent point! I apologize for my brevity and the delay in my response.

Overall, the documentation of Julia packages has improved a lot compared to when I started using it. Most packages have lots of examples and clearly describe functions in terms of argument types, processes and outputs. Thus my comment might likely be about rare instances rather than the norm. To give concrete examples, in one occasion I had to dig into the source code to figure out the function argument types for one function. In another, I struggled to find a reference describing the math that was used underneath the code. In both cases, developers very kindly responded to my questions and helped me out.

These are, based on my experience as a user, rare events. As the Julia community continues to grow, it might be hard to keep track of all the check-lists required for a robustly documented toolbox. It might be good to think of a way to do this programmatically, ensuring that docs are consistent across packages and contain all necessary info for seamless usage. Perhaps folks at JuliaHub are already thinking of doing so. Perhaps these matters have been broadly resolved and are now non-issues. :slight_smile:

3 Likes

Does anyone know if it would be feasible to add a “Report Issue” button in Documenter.jl generated docs that, when clicked, opens the New Issue form on GitHub/Gitlab for the corresponding repo with some boilerplate prompting the user for

  1. what information they were trying to seek
  2. what information they were able to find
  3. what additional documentation they need to get their answer

Putting our efforts for continually improving docs at the forefront and lowered the friction for reporting doc related bugs might be really helpful.

Edit: Doesn’t look terribly difficult.

10 Likes

I mostly do probabilistic record linkage or PRL (also known as entity resolution, fuzzy merging, merging messy datasets, etc) at work, using either the R package fastLink or the Python package splink.

I would like to see a Julia package for PRL because it likely could be much faster. There is no Julia package available for this that I am aware of. Some development in 2020 was discussed at https://discourse.julialang.org/t/entity-resolution-duplicate-data-in-julia/33860. Then, I mentioned the two packages SpineBasedRecordLinkage.jl, which only does deterministic record linkage, and BayesianRecordLinkage.jl which unfortunately no longer is being developed because the developer graduated and moved on to working on other things.

I am not aware of any progress in Julia but there is plenty of development in R (fastLink) and Python (splink). PRL is a big use case, which is relatively easy in R and Python but seemingly difficult in Julia (and in popular commercial stat software such as Stata and SAS). A PRL package in Julia could be a “killer” package for Julia.

2 Likes

I belief adding a “Prerequisites” section (or something similar) to the beginning of many docs would be very helpful, even for “end users” closer to academia.
Sometimes what is missing is just a warning, “before you continue, make sure you read/see/understand the following material”. From my experience (coming from Biology) that could means a lot, to have a more defined path toward understanding the surrounding material of the package.

7 Likes

Here’s an idea: how about a Julia SciML summer school for high school students and another one for grad students?

The school could be divided into cohorts based on research interests. For high school students, it could be an intro to scientific computing that goes into basic intro physics problems using calculus.

For graduate students, it could be an opportunity to learn about more principled approaches for solving quantitative problems in their area of research.

8 Likes

I’m curious. Does Julia have what it takes to outperform python in ML? Does it actually stand a chance? Or are we just believers who don’t want to follow the well trodden path? If so, how much would it cost to get there? $50,000? $500,000?

If we wanted to reproduce functionality in popular R packages, fully written in Julia, how much would that cost? If we wanted Turing to best Pymc in nearly all respects, how much would it cost?

How much would it cost to draft tutorials that go from step-by-step set up, writing your first code, sharing it via a blog, and saving it as a document using quarto?

I’m not a wealthy person, but I will donate money If I could see tangible results. For those kinds of tutorials I mentioned in my previous post, would $50 get one made? $200?, $2000?

Assume the money is out there. How much will it cost? It may not be too hard to get the money if Julia can actually deliver on the promise of high performance, fully differentiable, programming with high quality packages, deployment strategies so businesses can productionize their code.

Imagine Julia’s current deficiencies could be solved by throwing money at it. How much money would it take?

Wouldn’t Julia need a foundation like Rust or Python have to fund projects?

1 Like

See The Julia Project and Its Entities.

4 Likes

Just throwing a number out there, but, I’d guess something like $5M and 10 years.

Let’s face it developers cost $150k or more, you want 3-5 of them working on different bits full time at least. That’s $500k a year order of magnitude, and another 10 years or so means $5M

Why 10 years? Julia really started to be usable for me in 2019, and it’s 4 yrs later and finally TTFX is not a major pain. I think in another 4 years we will probably have some really good stuff like deployment as standalone binaries, GC that is low latency, Enzyme differentiability and soforth.

Python has been going on for what 30 years? With multibillion dollar companies developing stuff in it for 20.

Stuff just takes time. It wouldn’t be hard to accelerate all that a lot with some major money though. A multibillion dollar company saying "here’s $20M work on x,y,z " could make a lot of projects move forward. But that’s not what companies do.

(Meta spent $36B on “the metaverse”, Elon spent $44B on running Twitter into the ground, so it’s not like we couldn’t in theory have nice things, just that we can’t in practice. Note that $20M to Julia and 1000 other projects wouldn’t equal the amount Meta spent)

2 Likes

That’s not concrete. What packages and which functions? I am still unsure if you had an issue with the documentation of some satellite trajectory package or a data mining package. The biggest issue with improving documentation is knowing what documentation to improve. Without further details I don’t know how to improve the cases you mention here without improving the documentation of every package and every function. Generally the cases which people run into tend to be a much smaller subset than everything, and finding out things like “60% of the people who say we need to work on documentation all had a common thread that they were looking at some JuliaStats package” would be very invaluable information, and that’s the detail that’s left out!

The biggest issue with these kinds of things is funding and logistics. I do tend to run something every year. Last year it was at DTU (Ph.D. Course on Scientific Machine Learning), this year we have a workshop in Germany coming up in a few weeks. It’s generally someone else who got the grant funds and wants to run a SciML who then invites some of us to come help teach it. That solves the logistics problem, but it also means we’re bound on the luck of finding these opportunities each year :sweat_smile:. There is a nascent discussion of possibilities at UCSB for next year example, with nothing concrete.

I think we could run something at MIT, but if it’s a week I don’t know if we might need to pay for facilities. We can look into it. Doing something for MIT graduate students is easy, but for that there’s also the 18.337 course already so that would also be the least benefit use of time. Maybe the benefit of doing it at MIT is instead getting students from nearby universities for a week (though they do take 18.337).

If anyone else wants to host something but needs help with setting up the curriculum or doing the teaching, please get in touch.

5 Likes