Julia vs R vs Python

performance

#81

My understanding is that you made an assumption that julia community would like to have more users. That’s discussed earlier and it’s the wrong assumption. Julia community is seeking more developers, those people who are willing to write those missing tools and features. Conveniently in julia, it is a low barrier to transform from user to developer.


#82

This is known as the two-language problem (in the Julia community, and also outside of it). Solving it was one of the main reasons for creating Julia.

Use the language you find most convenient.

You should broaden your horizons a little before making statements like that. Working with GBs of data (when available) is now routine in some subfields. TBs also, eg in particle physics.

Also, working with large data is not the only task that requires speed. Some require a lot of calculations for a seemingly tiny amount of data (eg MCMC).


#83

That’s an odd statement, considering you’re talking to a bunch of academics who are using terabytes of data. One common source these days is single-cell RNA-seq. It’s all the rage in systems biology right now to generate full counts of RNAs for every individual cell for thousands of cells. I know a lab that’s regularly generating 60 terabyte datasets of this form, and this is what computational biologists are trying to handle. When things get to the forefront of computational science, you have to roll your own algorithms and having efficiency on this size of a problem is not just “faster”, it’s the difference between computable and not computable.

But I think the difference is really the reference. There’s methods people and application people. Methods people mostly spend their time creating new methods and trying to find an application that requires a novel method. Application people tend to care more about the scientific issues and pushing forward with novel scientific discoveries, regardless of the methods used. The former group pretty much by definition is always rolling their own algorithms, the latter are generally using packages. Julia, for obvious reasons, has been much more attractive to the former instead of the latter group. The overall effect is that Julia has become a place where lots of new data science and scientific computing methods are getting their first and only implementation/package. Julia’s goal right now is pretty much to be a strong enough development language that this continues to accelerate, and from there application people will use Julia because that’s where the packages are. But, given the current state, pretty much everyone here is a methods person (though that is quite rapidly changing), so that’s the disconnect between “R has good packages for VW” and “Aren’t you writing your own package because nothing else can do XYZ?”.


#84

I do! :slight_smile: That’s the burden of climate datasets.


#85

Note that you quoted something under my name which I did not say (quite the opposite!).

I am at the moment working on a computation which uses only a few Mb of data, but runs for a week (MCMC). I guess I am sold on speed :wink:


#86

Me too. Though, not for a week, and not at the moment (and not MCMC). But several wave propagation simulations with small inputs and outputs, that each run for a loooong time. Speeding that up from its current Matlab implementation would completely change my workflow.

(Also, I’m not in academia anymore…)


#87

This is an outstanding summary of my own experience and motivation for getting started with Julia, including the 3D PDE applications. That time sink of creating code that has can be used for problems of real interest is a real issue.


#88

The lines are also very blurred! In economics, more and more reviewers want you to add the randomization-based method of the day to compensate for a shortcoming of OLS. Sometimes these are implemented in a package already, and sometimes they are not.

Often, especially with R, the package that implements such a method may be so opaque that it’s easier to just write your own implementation. My impression is that people under-estimate the amount of ad-hoc implementations that the average scientist creates.


#89

oups! Made the quote too quickly.

Edited the post accordingly. :slight_smile:


#90

This is an interesting point, and I agree with it — I like to think julia has many selling points, and we should trumpet all of them. However, in practice it is very hard to get anybody to adopt a new language. Performance is one of the few or only things that gets people’s attention. The other big thing of course is library support, but any new language will always have fewer libraries than existing languages, so that can’t be an initial reason to adopt a new language.

Anyway, try convincing somebody that language X has nicer syntax or is easier to use than language Y. They won’t believe you, and even if they do it’s not really compelling enough to go through the difficulty of switching. Or try the default pitch of most research languages, which is that they will catch more errors at compile time. Well, it’s quite evident that a large percentage of programmers simply don’t care about that. But if you can take something that runs overnight and make it run in a minute, you have a real painkiller. If somebody doesn’t have any code that takes a while to run, getting them to switch languages might be impossible.

Performance is actually special. It’s not just another feature. All languages are Turing-complete so you can write anything in any of them. Performance is one of the only meaningful ways you can hit a wall with a language and not be able to do something.


#91

True, but the expressiveness of the language is an important selling point too. This is critical to the productivity of a programmer. I have often written something in Matlab in contorted and painful ways just because the language wouldn’t support the natural and easy path. I haven’t run into that yet in Julia!


#92

And I would heartily encourage you to do so: someone once made the comment to me that a great way to learn a programming language is to go learn a different one. That way you pick up on the differences, and start to think about why those differences exist, and what effect they have on how you write code.

R in particular is perhaps more different than most, with things like non-standard evaluation and odd scoping rules. Computer scientists typically view these as negatives (and they do have drawbacks), but those have allowed features like the formula syntax, and automatic labelling of plots with variable names, which are very handy for users.


#93

Academics in radio astronomy, particle physics and, increasingly, the life sciences—and many other fields too—would disagree with any suggestion that they are not dealing with terabytes of data; find out about the computing and data rates of SKA, LHC, LSST etc. Some of their predecessor projects were pumping-out terabytes decades ago.

In many fields, in academia or outside, computing power is often a limit to discovery, innovation and productivity. Exponential growth (albeit slowing) in computing power has helped, but harnessing that power often turns-out to be hard because of the difficulty in translating ideas into executable code.

I don’t want to kick Python too hard because, for some people to a certain extent, it seems to have helped to overcome the coding barrier. However, that frequently seems to come at a cost of gross inefficiency. I’ve seen people using it on medium-scale compute clusters to solve problems that they could have handled on a high-end desktop or single server, if they’d used modern Fortran — or now, perhaps, Julia.

This is usually a pragmatic decision: cost and availability of fast compute hardware vs programmer time, skill and availability; it’s a trade-off seen in well-resourced organisations, like financial services, but academics also sometimes come to the same conclusion, particularly as Python has become the preferred tool in many fields. It’s not a particularly sensible decision if one is concerned about the true (including environmental) cost of the resources used.

Regardless of performance and efficiency, I think that Julia can do a better job than Python of breaking the coding barrier (Julia’s syntax for numerical work alone—arrays in particular—seems to be more intuitive and memorable, IMHO). Speed and the elimination of the two-language problem only strengthens that.


#94

Apart from just the time aspect, I think one should also consider the waste of energy (analysed for instance in https://dl.acm.org/citation.cfm?id=3136031) and impact on climate change, which is sometimes neglected in these discussions. If faster (and productive) alternatives are available, using a slow language just out of unwillingness to change may become a minus in one’s CO2 balance :wink:

EDIT: fixed link


#95

URL is broken.


#96

Should hopefully work now


#97

Here is a direct link to a PDF version of the paper.


#98

Thanks for the reply.

I totally get it. But if speed is the main concern check this benchmark too…

https://h2oai.github.io/db-benchmark/

And there could be many more such benchmarks. I don’t want to go that side of equation.

All I am saying that different languages have different advantages.

GOLANG can build a single binary that can run on any platform without the need of any nginx server.

Javascript can let you run code on the clients computer that saves a whole lot load from servers.

C can do system level programming

Assembly languages are necessary to talk to hardwares.

And R can create shiny or flexdashboards and rmarkdown reports in seconds and remove dependency on tableau or SAS like software’s. It’s the only glue languages that has almost every thing imported from different languages like c, java, node, js, python and julia too… Where a language not used for building desktop software or huge websites like apps is still under top 20 of most popular language.

Please respect all languages. I know Julia is a great language and have to potential of being the best. But till we are not on that platform. We should stop these r vs python vs Julia debate and just focus on what’s important.

I would be very happy if we together would be able to make it. Because I too am invested in the language now. But its a long road and when my boss asks for a dashboard by end of the day I still have to go to R. Time lines are important too. It’s a dependency I still have.

Some of the people in Julia are still using python and R for obvious reasons.

Let move together until we can surely say we have a full ecosystem to compete against all those packages and all those arguments. Not just one or two major points.

I wish I could convince you that I promote Julia with all the new comers I meet. Telling them that if you haven’t picked sides yet choose julia. But deep down I know for some ad hoc instantaneous work I still have to go to R.

Please let me know if you have anything similar to shiny, r markdown flex dashboards and visualisation and stuff. That is what I am looking for in a language and in search of it. I have learned golang, julia, javascript d3.

Until I don’t find a use case its hard to adopt it completely

So happy analyzing. stay positive and stay healthy

:grinning::smiley::smiley:


#99

I had 35 years of C/C++ experience, when I came across Julia. The reason that I became such a fan of Julia is not just the performance of the code (and I was known as a performance guru), but rather the programmer productivity, even for the sorts of “low-level” code that I normally work on. Being able to write code that is just as fast as my hand-crafted C/C++ (often with some assembly language speedups), in about 1/3 the time, that is also far more flexible, is simply amazing for me. It means I can spend more time on the algorithms and data structures, writing nice generic code, instead of writing optimized versions of code for different types and combination of types.
It means I can get a project done, and have some time left to actually spend time with my kids!


#100

It’s worth remembering one important advantage for several of the languages you mention: namely, a head start of decades of development, both in the core language and in the ecosystem of additional packages, including things such as data presentation and performance-tuned table I/O. During those decades, their large development communities have included commercial organisations, some with fairly deep pockets.

It seems—rightly, in my view—that Julia’s creators have concentrated on refining the expressive power and performance of the core language for numerical computation across a wide range of scientific and technical fields, and that’s where it has a strong and arguably unique position (apparent in the packages developed already). That capability has largely been neglected by other new languages that have appeared over the last twenty years or so. There’s been no shortage of web toolkits, frameworks etc., that focus on presentation, but little that’s really new and game-changing for computational scientists and engineers.

Julia changes that, and also provides a foundation for wider development (remember, for instance, that it’s easy to call C, and hence access the operating system as well as libraries of C, Fortran and other code), but it’s still very young—just a few months out from its v1.0 release, rather than 15-20 years—albeit with impressive capabilities already and a solid basis for development of the kind of capabilities you seem to require. Rome wasn’t built in a day.