Designated Target Audience of Julia 1.0?


#1

We are getting close to 1.0. I am not trying to start a flame war. I want Julia to succeed. Alas, I am wondering what the target use and audience for julia 1.0 in 3 months will be. In what applications can Julia offer a reasonably good and viable alternative to R, python, C++, perl, Matlab, etc?

Graphics and Statistics julia’s 1.0 target is probably not graphics and statistics. These seem to be promising, but are still in flux. It’s not about the deep and unusual graphics and statistics—it’s that many basic tasks are still too hard. Users do not even know which graphics packages are supposed to be used or working at any moment in time. R is still far beyond Julia 1.0, in simplicity, in basics, and in depth. So, I find it hard to recommend learning Julia 1.0 over learning R, much less switching. Maybe 3.0.

(Big) Data I originally thought julia would be rolling out as a viable “big data analysis” programming environment. Alas, I am not sure that julia considers data programming to be important and central. Right now, it looks to me, as a typical user, that julia 1.0 is not going to be a viable solution. (more below.)

Mathematical Programming I see one good use immediately—julia seems to be a viable mathematical programming language. Maybe for optimizing hand-coded models that take days to run and are not particularly data intensive. Great Array handling. Is mathematical programming the targeted Julia’s 1.0 (only) niche? PS: If so, parallel programming changes targeted for 2.0 will be super-important.

Structured large user software projects Stronger typing and generic programming are great advantages over R. Not as good as C++, though. Judgment: Viable in principle. Problem: For large projects, developers tend to be very conservative in the languages and tools that they pick. Julia 3.0, perhaps.

Others Unstructured text processing (ala perl)? Special domains (e.g., biochem or learning libraries)?

what and who will 1.0 be for? where can Julia pick off (new) users in competition with other languages?



PS: Data Programming Let me expand here why I don’t think julia 1.0 will be viable for data programming. I see two main problems, one conceptual, one concrete:

  1. conceptual: is data programming even considered to be important and integral to julia, or merely one of many needs? I am asking partly because I am not even sure whether (the pretty darn good) dataframes is just another community package or considered to be a vital central feature of julia. if dataframe were to become neglected, my program(mer)s will be dead.

  2. concrete: can julia handle the most important standards for the exchange of large data sets among users and programs well—which I believe are large .csv.gz and .gzip files? yes, I hate these formats, too, but almost all financial data sets in my corner seem to come in this format. For example, I experimented with a 2GB gzip compressed data file (600MB). R can read it into a data frame with fread in about 10 seconds. the fastest I managed for julia was about 7 minutes. (readtable( GZip.gzopen("test.csv.gz") ); and I have not yet found a way to write compressed data files.

This could be fixed with a modest amount of love before the 1.0 release.


#2

You forgot scientific computing which I think Julia has the strongest story. There are tons of nice things in numerical linear algebra and diffeqs to allow for PDE simulations and SDEs in ways that Matlab doesn’t come close to providing.

Mathematical programming is in a great spot with JuMP, Optim, and other related tools.

And then the development primatives are great (GPU libraries, BandedMatrices.jl and crew, etc.)

I think Julia has a great target audience with the traditional mathematicians and physicists who would have picked up Matlab for these. I agree we don’t have the tooling for the full HPC crew yet (but it’s coming), and the data science / ML stack is great for domain researchers but it doesn’t have the convince for a non programmer to use yet. I think a lot of methods people (i.e. the people who are interested in the methods for their respective fields) will take a big interest in Julia, and in time that is how it will grow.


#3

I’m about to properly announce the new release of VegaLite.jl, I just want to finish a bit more of the documentation. My hope is that this can fill the ggplot2 role in Julia land. I think it is actually quite close in most areas and actually ahead for a number of things (e.g. interactivity). I also think that we finally found a syntax that a) makes simple plots simple, and b) still exposes all the depth of Vega-Lite, which is a really mature and exhaustive grammar of graphics plotting library. I also think/hope we nailed all the boring infrastructure questions (works in all environments without weird caveats, file IO etc.).


#4

thanks, chris. ok, so I think I got it about right. math, physics, and methods people. (it’s what I meant by mathematical programming.)

if some of the julia folks are listening: with one guy’s attention to the low-level basic packages, a good part of the data stack could still make it in time for 1.0, too, IMHO. this would open julia up to other potential adopters, too. but it would change the system here, from letting it settle slowly itself by experimentation and evolution to a best state, to pushing it actively and quickly.


#5

I am not sure physics people will be that excited about Julia, Fortran and c have been used for such a long time. I guess quant finance people are more likely to enjoy Julia. I know a lot of funds that discourage employees from using open source software, but employees still use Python and r a lot.


#6

Also, I think Julia should target engineering students and professionals in China, India, and Russia. I didn’t know MATLAB is not free until I went to graduate school in China, and almost every one in my school was using hacked version of MATLAB. If I knew GNU octave back then, I would not have used MATLAB.

If the goal is to expand user base, these emerging markets should not be ignored
If the goal is to find potential customers for Julia computing, then I am positive most of them will not pay for it.


#7

I guess a similar question could have been asked about R and, eventually, R ended up being used quite a lot. And I still hope that Julia will take over R (and I think it will, eventually).

I wouldn’t dismiss Julia for Data / Data Science / ML just yet. Julia DB is quite exciting, Flux and KNet.jl are great, the autodiff packages are bound to be very useful for ML etc. What is maybe missing is a generic “broad-purpose” data-science package like Python’s SkLearn or R’s MLR that leverages Julia’s advantages (e.g. for parallel computing). Of course there’s the julia translation ScikitLearn.jl which is great but some of the underlying functionalities could be optimised (e.g. decision trees). A few such good projects (a bit like the DiffEq package) is usually enough to attract a large community and it takes a few people to really pull such things (and JuliaCon to get them together I guess…)


#8

In my field (electronic structure), computational codes are written in Fortran, but are so unwieldy that people design python wrappers and frameworks on top of them. Meanwhile, mathematicians and computer scientists trying to understand and optimize these codes write prototypes in C++ or matlab. The diehards will continue using Fortran, but there is a chance of people coming together around Julia, especially if the HPC ecosystem gets better.


#9

FLAME ON! (Just kidding – I think this is a great topic for discussion and very much appreciate it).

While I agree the package ecosystem remains a little fragmented as lots of people try lots of things (and one think I would love before 1.0 is better documentation of the ecosystem so new people can easily figure out what tools to use), I would actually contend that the real value of Julia for researchers is in the architecture of the language and the solution to the two language problem.

I’m a data programmer in your framework – I’m a political scientist who does applied statistics, geospatial work, and network analysis, often with big data), and as I’ve argued here, I think teaching julia is one of the best ways to provide students with future-proof skills, and it’s a great way to empower they to be able to develop new tools without having to learn C++ (as is often the case in R).


#10

The issue is that neural networks are one part of machine learning while these things are the vast majority. I really like Flux and KNet, but our manifold dimensional reduction techniques and things like that are what make a full ML pipeline tough right now.


#11

I am not writing off julia 3.0, 4.0, or 5.0. my question was about 1.0. R (well, S and S+) needed 10 years to get traction, at a time when it had weak open-source competition. julia will have a tougher act to follow. incidentally, R and python won the lottery. julia cannot count on it. julia computing may not survive 10 years. julia needs an initial target audience to get a foothold.

I was careful not to ask for a compelling use case, just a solid competitive use case for 1.0. 3.0 better have a compelling case in some domains.

yifan may be right with suggesting china, india, and russia.

The data science folks that I know live and breath data sets. this means first-order support for dataframes, with fast import/export; not just for the higher-up parts like flux.

T may be correct on another good use case : Disk-Based Data Sets. R cannot really deal well with them, and Julia DB may compete well with SAS.


#12

I take it solving the two-language problem doesn’t qualify in your view?

Could not agree more with this. The health of DataFrames is critical.


#13

I already know of a couple of quant finance firms that looked at Julia and liked what they saw, but are going to hold off until SQL database support is simpler/more robust. So I guess my takeaway from that is that there is definitely interest in quant finance in getting C-speed without using C.


#14

I am not sure who “julia” is. If you are referring to the community: people develop projects they are interested in. These are necessarily very diverse. There no centralized direction for package development.

Many people are already using Julia productively. I do statistics, with a lot of exploratory plots, and I am quite happy with PGFPlotsX.jl. I can fully understand that it may not be the best tool for everyone, but I am not sure that I should do anything about this — they can write their own tools, finance or wait for someone else to do it.

I work on Bayesian statistics, which may qualify as cruel deep and unusual for some people. I wrote some libraries I found useful, but currently they are undergoing a major rewrite because usage has shown some architectural problems that need to be addressed, and I want to take advantage of various v0.7 features. But as software goes, I estimate at least 2-3 major rewrites until it becomes very polished. This may take years, and I don’t really see a way to rush this, as problems surface with extended use.

I am not sure I care so much about Julia overtaking R, achieving world domination, and similar goals. Probably because I am selfish, and I want a programming language that I enjoy working with, and a community of like-minded people whose goals potentially overlap with mine from time to time, so we can cooperate on the language and packages.

I would recommend Julia to anyone even at this point without reservations, with the understanding that working with Julia in its current state will involve a lot of frustration, bug hunting, detours for solving minor and occasionally unrelated problems. But all programming works like this, and the Julia community is exceptionally supportive to people who want to learn and don’t mind putting in the work.

The key is to approach Julia with the right mindset: not as a product that comes in a shrink-wrapped box with a bookself of manuals full of “This page is intentionally left blank”, but as a tool which empowers programmers.

I am doing this all the time with CodecZlib.jl without any problems; but if you need help on this please start a separate topic.


#15

Since the topic of the day is big defects remaining in Julia while approaching 1.0, let me put forward
two of them I perceive as a rather fresh user of Julia:

  • insufficient documentation. The help on functions available at the REPL is ok (in particular it containes examples), but only documents functions.
    Help on all other features is only available in the manual, in places which do not contain examples.
    For example, in the manual there is no example on how to make a package, which makes it rather hard
    for the beginner to start. So I would put as big defect lack of examples in the documentation at many places.

  • Basically no ‘standard library’. There are many packages doing nice things which are indispensable when
    programming in julia, but it is hard for the beginning to discover them and to know which to use. I think
    it is past time that the julia developers should ‘bless’ some packages, put them in the standard library and
    document them in the manual.
    For example revise.jl is pretty much part of everybody’s stack, and a package like memoize.jl is doing a basic task which is a primitive in most languages where it exists.
    Julia as discovered by a beginner does not have these packages (one has to dig a bit further to discover them) so appears much poorer than it is.


#16

There is a Standard Library, see the relevant heading in the manual. Various other curated collections of libraries are maintained in Github oganizations, eg JuliaStats. I agree that these could be more discoverable.


#17

I am using julia since 0.3 in bioinformatics field, which is partly data science, statistics and/or scientific computing. So, usage does not depend on a 1.0 release for me, but it is appreciated that it is coming.

There are two main reasons, why I am using julia (and still the other languages like R, python, …):

  • performance
  • the language feels right (I am not a informatician, so I can’t express it more formaly). As a physicist, mainly interested in theory and math, I did a lot of programming in many languages, naming some uncommon ones as example like erlang or SAS, and julia just feels right in many ways.

Speaking about R, which seems to be the number one language in bioinformatics, there is the main strength of the large number of packages. But management of versions of R, packages and underlying OSs is a big mess, which regularily smashes working setups with heavy problems bringing them to work again. The R packages are typically implemented in C because performance issues. This makes them harder to make them work again, if there is no maintainer anymore, which is the normal case, as packages are often results of a PhD contract.

That is, why I try to avoid as many third party packages as possible.

Now, with the performance of julia, there is no need for C anymore, which solves the dependency for packages written in C (C++,…). Still, it is better to avoid packages if possible, but if you need it, it is much easier to get back functionality after upgrades.

The conclusion is: there are many situation where people in the bioinformatics field still use R and Python just because at the time needed, the infrastructure is working, and time is short (when doing your PhD). But for a longer time scale, I would never choose R. Giving the “beauty” of julia, julia is my first choice, as long there is nothing which argues against it (like missing GUI framework).


#18

I don’t think Julia 1.0 can propose much to end users: end users want mature libraries, mature libraries are almost impossible to make in a constantly changing (pre-1.0) language. And this is expected.

I believe target audience for early Julia 1.0 are developers. Since you mentioned big data, let’s take a look at history of its large branch - Hadoop infrastructure.

In 1999 Doug Cutting released the initial version of a search engine Lucene. The project was written in Java, which was just 4 years old at that time, and gave birth to several other projects such as Apache Nutch and Apache Hadoop. Writing system-level software in Java was a crazy idea at the time, but it turned to be easier than doing so in C++, so these projects got many new contributors and grew up quickly.

Fast forward to 2018 and we have Spark, Storm, Flume, Flink, Kafka, HBase, ZooKeeper and many other related projects, all written in Java or Java-compatible language. Why? Perhaps, because there isn’t much choice really: you either downgrade to C++, which is a tough option for most developers, or take an inherently slow language such as Python (there’s even a Python port of Spark - DPark, but I’ve never seen it in production).

So maybe Java / Scala / Clojure / Kotlin are good enough for this stuff? I don’t think so: JVM is by design extremely memory-hungry and hides many low-level capabilities (e.g. see git vs. jgit discussion). Should I start a high-performance distributed system in 2018, I wouldn’t even consider Java. Hardly .NET, maybe Rust, but Julia, which is both - high-level like Python and fast like C - is such a sweat point here!

Or think about something like TensorFlow in Google: imagine that you are in need of an ML framework that runs on all types of your hardware (from mobile to a cluster of TPU-enabled servers) and you have resources to implement it, but want a good starting point - wouldn’t Julia be a reasonable choice?

For me personally a strong selling point of Julia is how easy it is to read (and update) an implementation of whatever function / library I’m working with. Want to know how NLL loss is implemented in Knet.jl? Or, here it is! Want to know the same about PyTorch? Ok, here’s corresponding class, which simply calls functional nll_loss, which, unsurprisingly, invokes C backend implementation at… ah, I have no idea. Anyways, at this point things become too project-specific to bother.

To summarize, I’d heavily recommend Julia for any project with large amount of new code and few dependencies on pre-existing libraries. Libraries will naturally appear as a byproduct of this process, and that’s where more end users will come.


#19

@ChrisRackauckas I completely agree with what you say regarding scientific computing.
Some random points from me:

I can see Julia taking off on the robotics side. I was working at ASML and I was told that engineers there regularly coded complex movement models in Python then recoded in C for the speed…

I dearly would like to see Julia applied in High Energy Physics - my original field. I gather these days it is all C++. Why are we asking graduate students to learn C++ - both a high entrance bar and the possibility of mistakes with null pointers etc. etc.
Julia seems such an excellent fit here - you could have predefined data types let’s say quark = up down charmed strange top bottom. Then manipulate these in a language which carries on to the plotting stage.

Also I used to deal with a commerical CFD code which was coded in a mixture of C±± and Java. So you had engineers writing Java code to get their models run. Engineers are bright people and learn fast. But why nto have a ‘pure Julia’ CFD code?

Sorry to have a downer on C±± but my own prejudice is that a lot of ‘shops’ have an investment in C++ and this is handed down to new graduate students or new joiners.
I see this as where Julia will make its breakthroughs - hopefully reducing the bheight of the bar to entry and also making for much safer code.


#20

Regarding the HPC crew, I was discussing Julia on the Beowulf list this morning, with reference to running it on the Xeon Phi. Thread support for Julia is still in a state of development.
But of course I realised that the Celeste project uses a boat load of Xeon Phis. So there is excellent science being done with Julia.