How do DataFrames.jl compare to R's? And Interoperability between R and Julia

Yes, I can think of reasons for recommending R over Julia. Julia + RCall in the end still means learning 2 languages instead of 1 and getting JIT overhead on top so you would need to have something from Julia that R does not have to make it worth it. There are many reasons for picking Julia over R (I assume you are aware of those) and if any of those are important to you then you should pick Julia knowing that you can call R if you need some package/algorithm it has.
If comparing just the programming languages, then I can’t think of a reason why I would prefer R over Julia.

But you should not underestimate the importance of packages and tooling. R has a very large ecosystem of packages including implementations of some state-of-the art or just esoteric things because it is very popular and has been one of the main tools for statisticians for a long time. Many of these packages are very good (also performance-wise) and work nicely together. On top of that there is some very nice tooling available (for example from RStudio) that make many common things surprisingly easy to achieve. For many people this is enough – they just want an environment for working with data with good libraries for some specific things. If Julia does not have them and they are not able to (or don’t have time to) develop this functionality themselves, then they end up calling R most of the time anyway. Recommending them to adopt Julia would not benefit them any bit in this situation.
That nice tooling, by the way, is partly thanks to R language’s extreme flexibility which allows for all kinds of syntax and “magic”. That in turn means possibility for very nice domain specific languages (DSL-s). Although this can easily be abused as well, of course.

In sum, recommendations depend on the needs and I can think of needs which are (currently) better met by R than Julia. I do believe, however, that we will arrive at a point where this is no longer true (not too far in the future).

1 Like

They have almost the same surface syntax (with tiny differences), and have very similar implementations (collection of vectors). One could argue that the pre-Hadleyverse R semantics of dataframes is different from Julia’s DataFrames.jl, since the latter is modeled after the “tidy” transformations, but that is almost irrelevant since everyone uses the latter these days even in R (you rarely see the built-in transformations like rbind outside old stats textbooks from the 1990s).

The current incarnation of DataFrames.jl is probably the least surprising part of the Julia ecosystem for an R user. Which is understandable, as it was designed that way. R/Hadleyverse is a great model to copy.

3 Likes

Julia in the REPL supports it too. Alternatively, div(a,b) works and also saves you from having to call floor. Another alternative that you can Intuit from the error message is floor(Int, a / b), as mentioned above.

EDIT: but I do see your point that users can’t immediately jump from one language to another. There are learning curves.

Is this a real problem? Do you really need this [for a DataFrame]? Rhetorical question, unless you have a good counterexample. It seems if you do, this could be a bug waiting to happen with integer / in R too?

I noticed a problem, only applies to string context (only in my 0.5 but not latest?), but how often in reality would you need to split a DataFrame (or part of a field in, such as like this) in half (or third etc.)?

julia> "Páll's"[end÷2]
ERROR: UnicodeError: invalid character index
 in slow_utf8_next(::Array{UInt8,1}, ::UInt8, ::Int64) at ./strings/string.jl:67
 in next at ./strings/string.jl:92 [inlined]
 in getindex(::String, ::Int64) at ./strings/basic.jl:70

[The string section of the manual uses this operator and other examples that are not safe unless you’re careful. E.g. for empty strings/DataFrames.]

Where this isn’t a possible problem, or interactively in the REPL where it’s ok to get an error, you can type in with \div and press TAB or everywhere you can do Alt+0247 (or other methods, copy paste, div() etc.). [Code is read more than written, and then also not a problem.]


If you actually need to split, I see split() for string in Julia, and assume there may be or should/could be something similar to (that would be more readable and safer than doing on your own with the operator?):