What steps should the Julia community take to bring Julia to the next level of popularity?

@cpfiffer said:

  • Challenge Python users by pointing out inefficiencies in their tools and demonstrating Julia’s superiority in scientific libraries and performance.

Vast majority of Python users don’t care. Performance difference is not that big on modern hardware (macbook air is insanely powerful) to be a deal breaker. From my interaction with Python users in industry, many don’t even know that most of the sci-compute/ml python libraries are written in C/C++. It is all Python to them.

For some reason - or may be because of two language problem - Python libraries have well designed, intuitive APIs. Numpy, Scipy, Pytorch, Scikit-Learn, Pandas have superb APIs. This aspect is perhaps not given that much importance because everything is written in Julia and you can always write it yourself (common answer in the board) e.g. forward fill for handling missing values in a DataFrame.

Python is like Coke. If you are trying to be a Pepsi, your product have to essentially look and feel like Coke. You can’t produce a blue slush saying hey this is way more healthy and tastes better too and hope to get closer to Coke. But it is ok to be an organic Kombucha and focus on the hippies.

13 Likes

That’s one of the things I like about Julia and the Julia ecosystem. I noticed while working on Python projects that my mentality towards problems is entirely different between the two. On Python I will Google a solution till I (probably) find a ready-made package or code snippet that does exactly what I want, if I don’t I’ll just circumvent the issue with one of the solutions I found. In Julia my mentality is to write an Issue, then try to solve the problem myself and make a PR. This is for better and for worse the mindset propagated throughout the Julia community.

However, while this does attract people like me (and lots of Julia users). It pushes away the average programmer.

8 Likes

To return to the Coke analogy, there’s a big difference between going to the supermarket to buy the canned product versus looking at healthier alternative of getting lemons, ginger and water to make your own drink.

2 Likes

I agree that the main thing holding Julia down is not having a deep learning framework on par with Pytorch.

And I believe there are only a few steps remaining:

  1. Make an abstract array type that knows its shape at compile time
  2. lower control flow to control primitives that can be overwritten with dispatch
    (not sure this is trivial with Julia’s scoping rules)
  3. AFAIK, with 1) and 2) type stability means a static graph, so now it can be compiled to XLA with some tracer type subclassing 1)
  4. To make life easier, write a macro that tags a function to throw an error when a type instability occurs during compilation

This would be like pytorch, but better since it would always be end-to-end compilable to XLA, which often gains 2x performance. Julia would gain lots of traction.

  1. Now the devs could work slowly towards making a high level compiler on top of julia to replace XLA (automatic memory managment for the array type in 1,…)

Now you can show the generated low level Julia code. Which means you have a language which solves the 2 language problem with easy to write code.

5 Likes

I guess it’s subjective, but surprised you like those APIs better than Julian ones.
I remember numpy code with all those np.c_/np.s_/etc, very rigid and not well-supported “structured arrays”, pretty adhoc “masked arrays”, overall very limited selection of arrays element types, … . It wasn’t fun (:

9 Likes

To me this statement seems to be a bit too narrow.

2 Likes

PyTorch has its own XLA too… IDK the detail but there are lots of competing compilers out here for ML.

1 Like

Are they just “better packages”? Would you say their success is just that someone has a good idea for an API and could stick the implementation? I assumed it was corporate funding.

I quite like Julia, which is my go to prototyping language. But i do find those popular python library APIs well designed.

Besides improving machine learning packages, what other feature/package could help to attract corporate funding?

I’m not certain what exactly counts as API, but numpy array syntax is really deeply awkward, especially compared to Julia or Matlab.

Do you mean it’s well designed given the constraints of being bolted on Python in hindsight, with no syntax support?

4 Likes

IMHO this is what Julia is for :wink:

2 Likes

Obviously the namespacing is annoying, but Numpy has some really nice syntax design that Julia misses. For example,

Maybe Julia’s array syntax has more going for it overall, but Julia certainly doesn’t dominate this contest. Other array models could be even better than both. For instance, I’m not convinced by multiarray broadcasting, among other things.

11 Likes

I think there should be some “default” enforcing workflows for newbies, because there are a lot of possible obstacles and “fragilities” that requires additional knowledge and tuning:

  1. Code updates: there are many different code styles with different redefinition behavior.
    • Write code in files and just include them into Main module => redefinition problem.
    • Wrap files into a module => module reloading works fine, but only if names for redefined structures and methods are not exported into Main.
    • Add Revise into startup file and write code in a package => need to generate a package with PkgTemplates, then code reloading works automatically… but no structure redefinition once again.
  1. TTFX: package precompilation conflicts with package updates, so despite the presence of precompilation, the user still encounters slow startup speeds if working with several environments for a long time. Seems like every add command can potentially break precompilation for all environments. There is --preserve flag preventing updates to registry, but it’s hard to write it out every time. So when user opens some old environment once again, he can never say if it will load fast or slow.

  2. Stacked global environment - user should be aware of stacked environments and add some utilities into a main env, like LocalRegistries, Plots, etc. If he doesn’t know about that, he will add those packages in every environment or package he creates.

  3. Package vs project (libraries vs scripts) - if working with package, user should be aware of stacked environments too, and use separate environment for scrips, just like separate test environment.

After user overcomes all these problems, and if they don’t discourage him from using Julia, he comes to some customized fragile workflow for himself with a bunch of potentially suboptimal fixes.

9 Likes

I think both started when there was less competition. What examples of corporate funding do you have in mind (e.g. corporation X gave Y dollars on date Z)? I think there is very little money going to most OSS languages.

1 Like

I guess I was thinking more how Google has several full time staff members working on the linux kernel. So less funding and more X members who get paid to contribute.

Aren’t PyTorch (Facebook) and TensorFlow (Google) the prototypical examples? I don’t know what the funding situation for those projects is, but I was under the impression that those projects have significantly benefited from the participation of Facebook and Google.

And JAX originated at Google… (link)

1 Like

Yes, there are many cases of companies that use OSS projects and contribute code back to those projects. But I think that’s quite different from funding (which also exists, hence the need to differentiate) since there are also many examples of corporations making PR’s that are rejected by the OSS projects they use and making asks from those projects that are rejected. The 2012 FB/Git e-mail thread is particularly useful to read as an example: Git performance results on a large repository - Joshua Redstone

But the preconditions here are two:

  1. The project needs to be something the company uses to begin with.
  2. There needs to be actual software engineers (SWE’s) at the company who can make changes to the code – being used by non-SWE’s is relatively ineffective since they may not be ready to make changes to the underlying OSS codebase.
3 Likes

These are very interesting examples, but it’s worth digging into some crucial details that differ from Julia:

  1. I would not say they benefit from companies participating, but rather that they were originally created wholly by the companies in question. TensorFlow is even now predominantly written by Google developers if you spend time skimming through the details of their top contributors: Contributors to tensorflow/tensorflow · GitHub
  2. They are not programming languages, but rather AI frameworks for deep learning.
  3. They were created well after Julia, Python or R.

I think a better comparison point is likely Rust, which was heavily driven by Mozilla for some time before much of the Rust team was laid off by Mozilla.

My point here may seem pedantic, but I think avoiding category errors here is essential: Julia can either be a programming language or an AI framework. If Julia is just an AI framework, I think it is fair to say it is not in the running for most users and it is not clear it will enter the running any time soon. But if it’s programming language, it’s worth comparing with its competitors like Python or R, which are mostly not driven by corporate funding.

11 Likes

This is true in so many ways. There are numerous comments on the web from people who have at least given Vim/Neovim a try after watching videos of The Primagen using it or from individuals like Tsoding who constantly build exciting things in C. Their captivating demonstrations make the tool looks cool, leading people to try it out at least.

5 Likes