Designated Target Audience of Julia 1.0?

There is a Standard Library, see the relevant heading in the manual. Various other curated collections of libraries are maintained in Github oganizations, eg JuliaStats. I agree that these could be more discoverable.

I am using julia since 0.3 in bioinformatics field, which is partly data science, statistics and/or scientific computing. So, usage does not depend on a 1.0 release for me, but it is appreciated that it is coming.

There are two main reasons, why I am using julia (and still the other languages like R, python, …):

  • performance
  • the language feels right (I am not a informatician, so I can’t express it more formaly). As a physicist, mainly interested in theory and math, I did a lot of programming in many languages, naming some uncommon ones as example like erlang or SAS, and julia just feels right in many ways.

Speaking about R, which seems to be the number one language in bioinformatics, there is the main strength of the large number of packages. But management of versions of R, packages and underlying OSs is a big mess, which regularily smashes working setups with heavy problems bringing them to work again. The R packages are typically implemented in C because performance issues. This makes them harder to make them work again, if there is no maintainer anymore, which is the normal case, as packages are often results of a PhD contract.

That is, why I try to avoid as many third party packages as possible.

Now, with the performance of julia, there is no need for C anymore, which solves the dependency for packages written in C (C++,…). Still, it is better to avoid packages if possible, but if you need it, it is much easier to get back functionality after upgrades.

The conclusion is: there are many situation where people in the bioinformatics field still use R and Python just because at the time needed, the infrastructure is working, and time is short (when doing your PhD). But for a longer time scale, I would never choose R. Giving the “beauty” of julia, julia is my first choice, as long there is nothing which argues against it (like missing GUI framework).

9 Likes

I don’t think Julia 1.0 can propose much to end users: end users want mature libraries, mature libraries are almost impossible to make in a constantly changing (pre-1.0) language. And this is expected.

I believe target audience for early Julia 1.0 are developers. Since you mentioned big data, let’s take a look at history of its large branch - Hadoop infrastructure.

In 1999 Doug Cutting released the initial version of a search engine Lucene. The project was written in Java, which was just 4 years old at that time, and gave birth to several other projects such as Apache Nutch and Apache Hadoop. Writing system-level software in Java was a crazy idea at the time, but it turned to be easier than doing so in C++, so these projects got many new contributors and grew up quickly.

Fast forward to 2018 and we have Spark, Storm, Flume, Flink, Kafka, HBase, ZooKeeper and many other related projects, all written in Java or Java-compatible language. Why? Perhaps, because there isn’t much choice really: you either downgrade to C++, which is a tough option for most developers, or take an inherently slow language such as Python (there’s even a Python port of Spark - DPark, but I’ve never seen it in production).

So maybe Java / Scala / Clojure / Kotlin are good enough for this stuff? I don’t think so: JVM is by design extremely memory-hungry and hides many low-level capabilities (e.g. see git vs. jgit discussion). Should I start a high-performance distributed system in 2018, I wouldn’t even consider Java. Hardly .NET, maybe Rust, but Julia, which is both - high-level like Python and fast like C - is such a sweat point here!

Or think about something like TensorFlow in Google: imagine that you are in need of an ML framework that runs on all types of your hardware (from mobile to a cluster of TPU-enabled servers) and you have resources to implement it, but want a good starting point - wouldn’t Julia be a reasonable choice?

For me personally a strong selling point of Julia is how easy it is to read (and update) an implementation of whatever function / library I’m working with. Want to know how NLL loss is implemented in Knet.jl? Or, here it is! Want to know the same about PyTorch? Ok, here’s corresponding class, which simply calls functional nll_loss, which, unsurprisingly, invokes C backend implementation at… ah, I have no idea. Anyways, at this point things become too project-specific to bother.

To summarize, I’d heavily recommend Julia for any project with large amount of new code and few dependencies on pre-existing libraries. Libraries will naturally appear as a byproduct of this process, and that’s where more end users will come.

19 Likes

@ChrisRackauckas I completely agree with what you say regarding scientific computing.
Some random points from me:

I can see Julia taking off on the robotics side. I was working at ASML and I was told that engineers there regularly coded complex movement models in Python then recoded in C for the speed…

I dearly would like to see Julia applied in High Energy Physics - my original field. I gather these days it is all C++. Why are we asking graduate students to learn C++ - both a high entrance bar and the possibility of mistakes with null pointers etc. etc.
Julia seems such an excellent fit here - you could have predefined data types let’s say quark = up down charmed strange top bottom. Then manipulate these in a language which carries on to the plotting stage.

Also I used to deal with a commerical CFD code which was coded in a mixture of C±± and Java. So you had engineers writing Java code to get their models run. Engineers are bright people and learn fast. But why nto have a ‘pure Julia’ CFD code?

Sorry to have a downer on C±± but my own prejudice is that a lot of ‘shops’ have an investment in C++ and this is handed down to new graduate students or new joiners.
I see this as where Julia will make its breakthroughs - hopefully reducing the bheight of the bar to entry and also making for much safer code.

1 Like

Regarding the HPC crew, I was discussing Julia on the Beowulf list this morning, with reference to running it on the Xeon Phi. Thread support for Julia is still in a state of development.
But of course I realised that the Celeste project uses a boat load of Xeon Phis. So there is excellent science being done with Julia.

I’m not very well-versed in R but I find it quite horrible for statistics, for example in Julia if you want to compute the pdf of a Normal distribution with parameters (μ,σ) at value x you do :

pdf(Normal(μ,σ),x)

In R you do:

dnorm(x,μ,σ)

If you want to truncated Normal between zero and one you do:

pdf(Truncated(Normal(μ,σ),0,1),x)

In R you do:

google for a package
...
dtrunc(x, spec="norm", a=0, b=1, mean=μ, sd=σ)

If you want a mixture of two Gaussians:

MixtureModel([Normal(μ1,σ1), Normal(μ2,σ2)],[1/2,1/2])

In R you do:

google for a package
...

If you want a BetaBinomial:

pdf(BetaBinomial(n,α,β),x)

In R you do:

google for a package
...

In Julia you have nice atomic concepts that are composable, while in R you just have a bunch of functions with unreadable names and packages with no common semantics.

I would be curious to see how this translates in R:

[f(D) for f in [mean,std,entropy], D in [Normal(0,1), BetaBinomial(10,0.1,0.1), Truncated(Normal(0,1),0,1)]]

Ironically the biggest issue with Distributions.jl is that it uses Rmath, but hopefully that will get fixed in time.

19 Likes

I care somewhat (not about overtaking R, but about the “similar goals” :sunglasses:) , because I want there to be a vibrant enough job market, so that as a consultant, I can pick and choose interesting work, for as long as I care to continue working.

3 Likes

On a more basic level, its hard for me to write functions in R because it doesn’t have static typing. When I write code, I want to write f(x::Vector, y<:Real) etc. In the same vein, not being able to use generators to make arrays in R is a source of frustration.

As the average masters student start writing functions as opposed to just scripts, the faults of R will become more visible. But imo julia is already great for scripting, and would recommend it for intro to data cleaning and regression.

I will be working on the JuliaEconometrics organization which should make it close to state of art for econometrics and will be able to play well for machine learning too.

4 Likes

R requires so much boilerplate for setup and structure. In theory one has S3/S4 classes, but they are considered “advanced” concepts and thus not used widely. Consider, for example the variance function var

var <- function (x, y = NULL, na.rm = FALSE, use) 
{
    if (missing(use)) 
        use <- if (na.rm) 
            "na.or.complete"
        else "everything"
    na.method <- pmatch(use, c("all.obs", "complete.obs", "pairwise.complete.obs", 
        "everything", "na.or.complete"))
    if (is.na(na.method)) 
        stop("invalid 'use' argument")
    if (is.data.frame(x)) 
        x <- as.matrix(x)
    else stopifnot(is.atomic(x))
    if (is.data.frame(y)) 
        y <- as.matrix(y)
    else stopifnot(is.atomic(y))
    .Call(C_cov, x, y, na.method, FALSE)
}

The very last line does the calculation, coded in C (I think). This is completely opaque from just looking at the R source. True connoisseurs are invited to examine lm (for linear regression).

To be fair, modern R can be much saner (but still of course not like Julia). Also, the language design goes back to decades, and has a lot of legacy elements which more or less make it impossible to change or optimize without breaking a lot (and I really mean a lot) of code.

In contrast, modern languages like Julia and Rust went in the direction of making abstraction zero (or low) cost, and thus encouraging pervasively modular design. This really pays off in the long run. Usually people emphasize the speed of Julia because that is easier to quantify objectively, but the most important advantage is allowing nicely organized code without trade-offs.

5 Likes

I don’t want to diss on R too much though because the more success R has, and the more success the “intro to datascience” movement by Rstudio et al. has, the more of a userbase julia can draw off of. They are doing some incredible things and we get to piggy-back off of that.

1 Like

I’m guessing an academic paywall is no drama here:

https://doi.org/10.1016/j.jss.2017.06.095

This gives some perspective on the sheer scale of the R data science ecosystem. It’s not something Julia can directly compete with, its a social phenomena an order of magnitude larger.

Julia can compete on raw technical merits for solving processor intensive problems in abstracted, modular ways. That was what the promise of Julia always was to me and it does live up to it.

I’m glad people find Julia useful for stats as well, but personally stats always means stats+GIS and Julia is years away from competing with R, and I’m not sure why it needs to.

I am a physicist and I am totally excited about Julia :wink:

10 Likes

Hasn’t @ChrisRackauckas made a package for R users (and another for Python users) to be able to easily use his DiffEq magic?
There are also the RCall and PyCall packages to go the other way.

Julia doesn’t need to supplant R or Python to “win”, just replace them as the first “go-to” tool in a programmer or scientist’s toolbox, letting them still use whatever things are good from those ecosystems, just as people don’t have a problem using libraries written in Fortran, but they wouldn’t think of writing Fortran code themselves.

2 Likes

I think julia has great long-term potential for 3.0 to 5.0. (I am here for the static strong typing, programmability, and dual-language problem, too.)

alas, 1.0 needs good uses not just for us few dozens on discourse, but for a wider audience. yes, julia is open source, but if julia computing [jc] were to go away now, I think julia would die. it needs to get some traction and sooner rather than later.

for the short term, if I were an advisor to jc, I would advise tuning/focusing a few specific use targets and libraries, whatever it may be. this is not to overtake python or R (hopeless), but to be a viable alternative in some places.

here at UCLA, julia 1.0 is not viable for our finance MFE students. yes, it has a standard library—without data sets, graphics, and good data import/export. ergo, I cannot push julia onto them as a primary instruction language. they already learn two languages. pushing a third language into a 1-year program won’t happen. it doesn’t solve the 2-language problem, it would create a 3-language problem. trust me—I wish I could.

firms and universities are “sniffing” alternatives all the time. betting their future on an alternative is a much harder hurdle to overcome.

so, jc, please pick a few more good target uses for 1.0 and curate some love into it, whatever your targets may be.

/iaw

1 Like

Well, you are too pessimistic. While I agree with you that I wouldn’t suggest Julia as first language for students yet, Julia is already great for my purposes (dynamic simulation and control of complex dynamic systems = wind drones). The package manger is great. Much better than those for Python, that I used before. The performance is at least one magnitude better than Python+Numba. The code is easy to read and to maintain. So Julia will have a great future!

Uwe

7 Likes

Great point. It may be efficient to get Julia used in class first, and when the students get employed they will use Julia in industry. That is how R succeeded. Another thing is that R is easy to set up in class, for example, package installation and plotting are fast.

I emailed Julia computing last year to set up an API to Bloomberg and Wharton Research Data Services, but nothing happened.

I use Julia for everything except when I need rasters+stats for ecology, so rgdal, dismo with other stats/gis tools that just don’t exist in Julia in the same easy to use form, and seem to be years away (maybe I didn’t say that clearly).

The idea of using wrapping Julia packages in R/python is great, and I’m thinking of eventually doing that for some spatial modelling packages I’m working on.

1 Like

I just don’t understand this whole thread.

Julia already has best in class libraries for numerical work. It has a viable niche for a significant chunk of scientific work, and can build from there. I’m not sure what else you want.

8 Likes

Yes but this is very different from the tools that R’s core userbase uses.