What can we do to make Julia grow fast?

Seriously, julia-lang labelled questions on StackOverflow never go unanswered for more than an hour or so, unless they’re poorly specified or very advanced. This is already working.

I have a question for this thread: why the hurry? Julia is growing wonderfully, but things take time, and I don’t see why we should be so impatient. Just my 5 cents.

11 Likes

Awful, please don’t.

Thanks @ExpandingMan for bringing in some interesting pointers. My suggestion will be if the Julia wrapper can get a mention in the Tensorflow package or documentation anywhere. That way people evaluating Tensorflow can also know about existence of the Julia interface. It’s more of evangelizing Julia interfaces by coordinating with other package teams and ensuring the interfaces are kept current as much as possible as the other packages get developed.

3 Likes

That’s a great idea. Julia is one of Kaggle’s 3 supported languages and I think that’s likely exposed a number of people to Julia who others wouldn’t. Having some recognition on TensorFlow’s documentation would be another great opportunity for exposure.

1 Like

But it’s not easy to find equivalent questions because Julia and Python are different languages, there is no one-to-one correspondence.

I fully agree with @ChrisRackauckas here. In short, improving package ecosystem and creating tutorials for package users is needed for wide adoption. Most users will come when they see that Julia helps them get their work done quickly and easily. And thanks to the language features, it is more likely than for many other languages that package users will soon start contributing as well.
Creating user-focused tutorials and materials is something we can all help with.

1 Like

While I agree with this, I think that wide adoption is neither necessary nor sufficient for fast growth (for the language or the library ecosystem). The latter needs contributing users, where contributions are defined broadly (helping out here or on SO, writing a blog post, opening an issue, making a PR, writing and maintaining a package, etc). Adoption is very relevant for a business model which relies on a large number of (paying) users, for open source contributions matter more. A healthy open source community has a large contributor/user ratio, and “wide adoption” per se may not be that relevant.

IMO what the Julia community needs is gently nudging and nurturing users to become contributors. Found a bug? Open an issue. Want it fixed quick? Make a PR. Have been using Julia for a while? Answer questions on forums. Find the language useful? Brag about its benefits on your blog.

9 Likes

As said above

And thanks to the language features, it is more likely than for many other languages that package users will soon start contributing as well.

I think that wide adoption and contributions are always related but especially so for Julia because there are less barriers to contributing.

2 Likes

I must say, that i am quite impressed by the package authors and Julia-maintainers in this regard: I am just making my first steps towards useful PRs and i am amazed how well this is received. Even PRs to fix some minor typos got friendly reviews and suggestions by the maintainers. These were by far more time-intensive to write than to fix the issues.

I love this experience and i can see that this gets new users quickly up to speed regarding the whole contribution workflow. I still think there could be some better/more visible docs for this workflow. …I should prepare a PR! :slight_smile:

12 Likes

That’s a fairly interesting topic.

To give you a little bit of perspective, here’s my starting point

I’m a research engineer in the french academia. I’ve been teaching for roughly 10 years quantitative methods applied for social sciences.
The bulk of work used to be done with spss. Back in the early 2000 i was trained with spss.
Then came stata in 2009/2010, all course materials were transposed to stata.
Then , since roughly 2013, it became increasingly easy to (1) use spss/stata datasets under R (2) produce reproducible research with R.
Yet R is not frequently used at master’s level, because most students are not skilled enough in CS to learn it in 24 or 48 hours courses.
Only Phd and post doc got to learn R.

The move to Julia, in my field and I guess in a lot of other field where stats are needed but people are not interested in CS / technical issues, it needs to be very simple and lean to learn for beginners without any CS background, and it should give efficient shortcuts to implement mainstream analyses.

So here are my specifics

  1. Data Management. Every research project involves a LOT of data management, mangling with labels, various coding of the same underlying data, merging tables. The simplest and the fastest it is the better. In this field alone, i would suggest that stata still dominates R (even incluging the tidyverse) and julia lags quite behind, partly because of all the issues about data frames, missing values, dataframesmeta, query.jl, datastreams. All that is way too complex to be brought to our students given the time slot we get to teach. And i think a lot of teachers would feel the same.

  2. Reproducibility. It’s now common ground to be able to produce clean reports and reproducible analyses for publications and reviewers. It used to be SAS strong points, now it’s R leading the way, stata and spss lags behind even is statacorps is trying to catch up.

  3. Flexibility. Graduate students need to learn quite a lot of different things : descriptive stats, modelling, plotting but also geometric data analysis, network analysis, text mining, mapping.
    Given that point, R is taking the lead because you can learn one framework to explore all those fields, while you used to learn (a) a GIS (2) a statistical package (3) specific softwares for specific fields such as gephi and pajek in network analysis.

  4. Descriptive stats. Descriptive stats, including label management, as an absolute must in every social science project. Yet, as they are not very “interesting”, they are not given a lot of love in most statistical frameworks. Only recently did R get a good boost thanks to the SJ series of packages.

To sum it up, julia hold huge advantages

  • the licence (good for academia & teaching AND good for businesses)
  • it’s fast
  • it’s not weird and akward in the sense that R is
  • the sky is the limit

But thoses advantages are restrained by those main problems

  • it’s not yet mature (ok that’s nearly there so this won’t remain a problem)
  • data management is cryptic given the dataframes thing
  • labelled classes ?
  • descriptive stats framework ( function v1 v2, options)
  • fast shortcuts to common procedures (Daniel Lüdecke, Dr. phil. for example is gold standard in productivity)

If those points are tackled, i think the underlying qualities of the language will shine, so fingers crossed
(should I say, i’m not complaining at all, i use and teach 4 different stats frameworks, huge progresses have been made over the last 10 years so in anyway future looks bright !)

3 Likes

Can you develop a bit? I think DataFrames should support variable labels (see this old issue). CategoricalArrays could also allow adding long descriptions to each levels, that wouldn’t be too hard. Anything else missing?

1 Like

That sjt.lmer functionality looks lovely - should be very easy to create for the julia stats packages as well.

I don’t know about labelled classes in julia, i’ve never seen them used in any example but they have been implemented rather late in R. As a consequence most packages to not take advantage of them which is quite annoying. https://cran.r-project.org/web/packages/labelled/labelled.pdf

I would tend to believe that they should be the default implementation with no label being an version of this general framework.
In Stata or SPSS, factors do not exist as such. There only are integers with attached labels.
Every data management and analysis operates on those integers.

Labels provide huge productivity benefits for outputs and plots, having them by default seems wise to me (but that may be totally irrelevant for other fields of scientific computing).

Best regards and thank for all the great work already done !

3 Likes

Frankly, if one is just applying a small number of canned algorithms via a GUI, it is difficult to make a compelling case for Julia (or even R). Developing a tool which requires little understanding, but is supposed to just “do what I mean” conveniently, with a GUI, and without significant user intervention is thankless work, which is why commercial companies have dominated this market.

1 Like

Agreed. No point going after the stata and R users for a few years. Attacking the confluence of matlab users, and people like me that want a more productive modern C++ (without all the OO garbage) is hard enough. That isn’t to say that dataframes isn’t essential, just that if the user is just doing linear statistics, then stata is tough to beat (however maddening the language may be).

Stata costs money, doesn’t allow working with multiple datasets at the same time, and needs to load all data in RAM. So no, I don’t think it’s hard to beat for any slightly advanced use case.

3 Likes

Maybe you’re ok but those users will, eventually, become knowledgable in there field.
They will want to go beyond what’s offered in the package, they will want new functions, they will want to be able to change the specification of their models.
At some point, they will switch to R / Python to meet those needs
At some point, they will try to develop some package of their own.

And it’s where Julia shines. Thoses user are stuck at the “two languages wall” because they never learned and most of the time do not want to learn fortran or c++, even if they know quite a lot of stats and they want to work conveniently with larger and larger datasets. This is the sweet spot for Julia.
And arguably, it’s better for future scientists to learn julia than python is graduate school, as the amount of data to process is ever increasing and not going back.

I went to the process myself.
I heavely use a solid R package for ecological data analysis called ade4
I tried to look at the code, there’s some R in it, and the rest of it is fortran.
Obviously this was the breaking point for me.

But as I said previously, I perfectly understand that there are other needs that may be more importants at least in the short run. Still I think reaching out to future data analysts and users through undergrad classrooms is essential for a successful language / software, even if it’s not “state of the art” data science.

2 Likes

No offense meant, but the Julia ecosystem has not even converged regarding a data frame representation. Stata has a lot of algorithms coded, tested, and integrated. I am not saying that it is impossible to duplicate, but it is a lot of work. Not something you can slice through with a few clever abstractions, but more like plotting.

1 Like

Amen. Give the data developers some breathing room to get an efficient and type stable competitor to pandas proven, and get the development environment to be responsive and stable. Then Julia is in the position to create no-overhead embedded DSLs… (Which only c++ and haskell can otherwise claim). I have no doubt that the EDSL version of stata would strictly dominate it in performance and simplicity. Eventually.

Until then, huge efforts in creating packages to fighting Stata and R at their own game is time wasted. As for “stata/matlab is expensive” argument in my fields that is a losing strategy. Anyone doing this for a living can justify paying a couple hundred to thousand dollars for essential software, and many universities give matlab and/or Stata to faculty and grad students for free. There are a million reasons I want to move away from matlab and stata, but price is not one of them.

4 Likes

I mostly agree with you.

The only problem is that the bar is set really high.
Namely, most of Julia functions are written in a style of highly knowledgeable programmers with tricks of Object Oriented styling, Meta Programming, etc…

Something which, in my opinion, starting to be a lot like Python which tends to be over intelligent (They aimed for very elegant, but forgot simplicity in many cases, but that’s OK, Python was meant for programmers at first).

I’d expect things to be simpler from a language which targeting Scientific Computing.
I’d prefer more usage of simple In → Out models and less meta programming, macro’s and stuff that make me scared.

I know, probably, I’m not the highest qualified programmer (I’d say in this forum probably in the bottom), but usually I’m doing really good with working on Signal & Image Processing, Optimization and Machine Learning.
But looking on advanced (Which is the default, at least to my eyes) Julia code makes things hard for me.

I think in order to make Julia successful, an extremely important point, is to narrow the gap between how qualified a user needs to be in order to affect the language development , contribute to it and using it to its developers and core users of the language do that.

For instant, whenever I pick on MATLAB code of a Toolbox or some (Better, there is garbage there) files in File Exchange I can easily understand the code. It is simple, straight forward and I don’t think it is overly smart.

How do they say in London Tube? Mind the Gap.

3 Likes