A data scientist's thoughts on R & Python

This block by Gordon Shotwell describes his comparative thoughts on R and Python. Maybe this article is helpful to understand how the “ordinary” data scientist thinks and what he needs in his daily work? And helps to design Julia functions and packages, especially for the statistical environment? :thinking:

Especially the 3rd part “The glory of CRAN” supports my arguments about packet quality which I have already made here.

5 Likes

R is a functional programming language, which means that the natural way to accomplish something in the language is to use functions.

:thinking:

6 Likes

When I see people describe R as functional or OOP it usually seems like they’re just trying to win an argument. There are 4 iterations of class systems in core R and others provided by packages. I don’t think you can really say it’s just functional or OOP

2 Likes

Very interesting reading. Is there a way to implement this R code in Julia?

fancyError <- function(df) {
  class <- class(df)
  var_name <- as.character(substitute(df))
  if (!inherits(df, "data.frame")) {
    warning(glue::glue("'{var_name}' is of class '{class}' when it needs to be a dataframe"))
  }
}
fancyError(my_var)
fancy_error(::T) where {T<:AbstractDataFrame} = nothing
fancy_error(t::T) where {T} = error("$t is of type $(T.name) when it needs to be a dataframe")
1 Like

Thank you. It’s simpler to implement this function in Julia than in R. It seems that all examples used to demonstrate benefits using R over Python in Gordon Shotwell’s blog can be simpler or easier to implement in Julia, IMHO.

1 Like

This doesn’t quite work. The error message interpolates the value of t, not the name itself. I don’t see how you’d be able to do this in julia without making fancy_error a macro.

1 Like

If by “name” then you mean the global variable name, you’re right. But even the provided example only works on the top level function in R. Eventually you’d need to use stack tracing or some step into the function with a debugger to track down the exact mapping from your global level variable.

Yeah. From the blog post, that’s what I thought the intention was–so users can know what familiar global object is causing the problem.

If you don’t need that, in R, you could just paste(df, "is of type...") without using any non standard evaluation.

I disagree with almost everything in this blog post. Specifically,

  1. R’s native data structures are seriously lacking: we talking about lists, and vectors with very limited element types (boolean, int, double, complex, string, “raw”), to which you can tag on metadata. Almost all of R’s “native” data structures is conventions about this metdata. This is indeed “stable”, but seriously constraining when it comes to writing organized, performant code.

  2. Non-standard evaluation (basically, functions getting a bit of a context) was a very appealing idea when introduced originally, but it turns out not to compose well, and make efficient compilation impossible.

  3. In theory, R is eminently suitable for functional programming (a lot of parts were inspired by the Lisp family). But in practice, higher order functions and closures in native R code almost always imply a huge sacrifice in performance, so they are not used. People usually end up coding Fortran/C++ instead and calling it from R.

All of these points are of course well known. R users just work around them — this may be a reasonable choice when R has other advantages for some application.

Turning to

I not sure this is desirable, or why it should happen. I am surprised that someone who considers himself a “professional programmer” calls the command line “bullshittery”, but it summarizes the attitude nicely.

Open source communities thrive when they have contributors who not merely users. If people are reluctant to get their hands dirty, I am not sure people will be inclined to tailor the software they write to their needs.

I think that Julia coders should write packages that they find useful and are proud of.

12 Likes

I have been using R for many years and I also know the weak points and the performance problem is well known. In this I agree with all posts and that is my personal reason to use Julia. But on this point …

… I would like to add to my impression that a functioning and successful community also includes those members who, through good example and use of Julia in daily practice (perhaps as a data scientist?) show that Julia is a very good tool for all challenges.

I think that Julia coders should write packages that they find useful and are proud of.

Yes of course, but that says nothing about the quality. :wink:

1 Like

And I would like to add, that the package system (CRAN and other derived systems like bioconductor) is in no way superior or more stable or whatever as it is for Julia. After years (>15) of R and bioconductor I had countless unresolvable issues with not compatible package versions.

But on the other side: I started to answer here at the very beginning but canceled it. The main reason is that those discussions R vs Python, Julia vs. R, Java vs ++ vs C#, … they are typically not very enriching and rewarding. They end with everybody has some valid points and nothing is learned. At the end it was all about taste.

7 Likes

The main reason is that those discussions R vs Python, Julia vs. R, Java vs ++ vs C#, … they are typically not very enriching and rewarding. They end with everybody has some valid points and nothing is learned.

I agree with this if the discussion is held in this community. Nevertheless, I think that we should not close our eyes to such conversation, because these thoughts might broaden the acceptance and use of Julia. I think the use of Python and R is currently quite “overwhelming” (for data analysis). I am trying my best to change that… :wink:

3 Likes

I don’t think it is that much about taste — one’s decision is arguably subjective, but there are objective features of languages one can talk about meaningfully.

The problem is that these kind of discussions are mostly meaningful if all a participants are at least reasonably well-versed in both languages that are compared. Which is indeed rare. But when it happens it can be quite informative to read.

Certainly. OTOH, there are always people who feel they are entitled to high-quality, polished, and importantly free software tools, and are affronted if they are asked to contribute anything, or, heaven forbid, use a command line or isolate an MWE. We are lucky because this behavior is not very common in the Julia community. I hope this will remain so.

8 Likes

Oh, excuse me! I’m afraid I expressed myself clumsily and misunderstandably. :flushed:

I didn’t want to give the impression that anyone in the community is just using the output and making demands. My intention was to give my impression that a community and ultimately Julia can benefit from having community members, who may not be some of the top package developers, demonstrate that Julia is successfully and beneficially used in their daily work.

Are you worried they don’t exist or something? The number of “top” package developers / core contributors is probably no more than 100 (weak estimate). Based on the last Julia Computing newsletter, Julia was downloaded ~5.5 million times last year. There are a lot of Julia users who are “just users”. I would conjecture that a nontrivial fraction of those users live on the edge of “using” and “developing”. To a large extent they just use the major packages, but if they feel up to it they might collect some useful code into a package for their own use. That is not going to be super visible to the casual outside observer, and I think that’s ok.

3 Likes

Every time I see this conversations on this forum I wonder if anybody here used R v 0.X (was that even available?) or Python. I first saw R at like 2.4 sometime in the mid-2000’s, I believe, and I remember I had a lot of problems loading data, there were some tutorials, but not a ton, same with books. Obviously for someone getting into R today the situation is quite different and the information is everywhere.

So, we just need to keep growing as a community. We all know this. So, these discussions are interesting, but the reality is that a lot of those things are not going to magically change. Unless some company adopts Julia as their main language and start pumping serious money into the ecosystem, most work would be done people using Julia for personal projects, a lot of times on the side. So, progress will be piecemeal.

And if someone can remember R or Python in the time of versions 0.X or 1.X, I’ll appreciate any history, maybe in another thread.

10 Likes

Yes, I can add that the giant Oracle, I’d never heard it’s usage before version 8, then version 8 goes everywhere making serious money for it.