Doubts about language


#1

Dear,

I am a C / C ++ programmer and I maintain packages in the R language. I already program in C and R and teach about these languages at a University in Brazil.

I know very little of Julia’s language. However, it seems to me that it has a great future. Often in R I need to import C / C ++ and Cuda code to speed up my tasks. I also see that in Julia, communication with C codes is smooth, without the need to use API`s.

I have some questions that follow:

Is there a lot of changes in the syntax of the base language so that something that has been studied now will not work in future versions?

[2] I am a PhD in statistics and I think of studying the language to transcribe R language packs into the Julia language. What do you think about that?

[3] My great fear of studying Julia now is to have serious problems of maintenance of the packages by what was questioned in the item [1]. What do you have to tell me about it?

Best regards,
Pedro R D Marinho


#2

For what it’s worth:

  1. There are still changes, and sometimes they do require work. However, version 1.0 is very close, at which point many expect more quiet in the development, I think. Mostly the base language is quite mature - it’s the package ecosystem that’s still fairly young.
  2. A better idea is probably to build the functionality in Julia from the bottom up. R is licensed GPL and Julia is MIT, which means that transcribing an R package breaks its license, which is serious. Also, frankly, the R ecosystem is very fragmented, whereas julia has a culture of building a more integrated package ecosystem where types and functions work together across the ecosystem (probably nurtured by the github nature of packages and github organizations). Emulating the behaviour of individual R packages may thus lead to less ideal results. Which packages are you thinking about?
  3. It is a bit of work to keep a package maintained, especially if people start using it a lot and opening issues. However, keeping track of language development is not the biggest contributor to that work, in my experience, so I wouldn’t worry greatly about that.

#3

Mkborregaard,

Thank you for your replies.

It seems to me that packages in R for having a GPL license> = 2.0 do not allow them to be placed on other license terms like in the case of MIT. However, I think transcribing a package is possible because the transcript code is different, but I’m not sure.

There are several packages widely used in R by the statistical community, such as the betareg package that works with beta regression, nortest for nonparametric tests, among others.

Best regards,
Pedro R D Marinho.


#4

As far as I remember, GPL covers any derived work including rewriting an algorithm to a different language. Also, R and Julia are quite different in treating (non-)type-stable code, e.g. see a blog post about DataFrames.jl which mimic R’s data.frame, but can’t achieve the same speed. What Julians actually had to do is to rethink the whole package and come up with DataTables.jl.


#5

My 2 cents:

  1. “Transcribing” a library or even semi-complex code from another language, especially R, may not be a good approach. Almost surely, a different style of interface and organization will be natural in Julia. Reimplementing a package is different, might provide a good learning experience, and solves the licensing issues (R packages usually have a good bibliography, you read the papers and program the methods).

  2. Your code will break almost surely in a couple of months (1-2 minor releases), and will require maintenance. The upside is that the changes will be minor, and the Compat package helps with transitions. However, usually the new features are worth it: the dominant attitude in the community is not “OMG another release that breaks my code” but “give me the new release now because I need the nifty new features.”

  3. Be aware that many features of the Julia ecosystem are immature compared to R. Expect to find issues, and occasionally working on fixing them yourself will be helpful.


#6

Since the language is in beta, it is true that new versions of the language require some changes, however, the language will be in 0.6 any day now and 0.7 will contain all of the features of 1.0. At that point the language development will be far less disruptive. With a few exceptions, from the user perspective most of the changes so far have been simply fixed syntax changes. I have personally found updating packages from 0.4 to 0.5 and from 0.5 to 0.6 to be quite trivial, however getting packages to work on both 0.(n) and 0.(n-1) is somewhat harder.

It is true that Julia is a young language and therefore the ecosystem is poorly maintained compared to R or Python, however, something that doesn’t get said enough is that, because of the way the language works, Julia has some pretty remarkable advantages when it comes to maintenance and extensibility

  • You can write everything in Julia without worrying about performance (compared to R or Python). This is a really, really big deal. It is far easier to figure out how code works in Julia and make changes to it if need be.
  • Julia has lots of really powerful introspection tools that are missing from other languages and I find that it is far easier to reason about multiple dispatch code than C++ style OO code.
  • Julia packages are powerfully extensible in a way that R and Python packages simply aren’t. This is mainly a result of metaprogramming and the ability to extend or overload functions using multiple dispatch. There are quite a few impressive examples, but see, especially ReverseDiff.jl and Flux.jl.

#7

Thanks Tamas Papp.

We think very much alike. Certainly the transcription of packets from R to Julia requires a lot of effort and is not a simple job since many features will have to be rethought.

The R language dates back to the 1990s and is well established but built on many problems.

The great advantage of language is that it was built by statisticians. The big disadvantage is that it was built by statisticians.

What I realize is that the statistical community is increasingly dominating computing and taking an interest in the subject. I believe the transcription of packages from R to Julia will be very important in the near future.

Best regards,
Pedro Rafael.


#8

15 posts were split to a new topic: Licensing speculation


Licensing speculation
#9

To clarify this point: copyright law covers only source code – copyright does not and can not apply to the underlying mathematical idea. Math cannot be copyrighted. Literally transcribing source code should be considered a derived work and only done under the original license terms. Implementing an algorithm “from scratch” based on a mathematical description is not a derivative work.


#11

I’ll add that this depends on what do you mean by transcribe. If you mean something automatic then it’s in general impossible or useless. The reason julia is fast is not because it has a good compiler, in fact we have a pretty terrible one. The performance is tightly bound to many language design decidions making them easier for the compiler to analyse. It is of course possible to use a R-like syntax but it won’t have a semantics anything close to R so it’ll be of very limitted usefulness. OTOH, if you implement a lot of R features, it’ll be very unlikely to perform better or evven comparable to R since our runtime is not optimized for it.


#12

Surely it cannot be illegal to transcribe code? You mean that you cannot release transcribed code under just any licence?

Edit: or exploit it commercially and so on…


#14

The nice thing about that statement, is that a lot can be done after 1.0 to greatly improve the performance of the parsing, lowering, and JITting code, and to make the compiler do a much better job at optimization.
What would be pretty much impossible to fix later on would be the underlying language design, which is a pretty wonderful one.


#15

There are many R language users who use the betareg package, for example.

I wish they could enjoy the Julia language and that it had functions to work with the Beta Regression Model. I’m not saying I want to put useless features of R on Julia, that would be dumb of me.

What I mean is to place mathematical functions in Julia in a convenient way in the Julia language so that statisticians can enjoy the use of the Julia language and become interested in language.


#16

I’m starting to converge for the idea of waiting for the language to mature a little. Maybe hopefully version 1.0 will be released to venture out with a little more peace of mind.


#17

I see a lot of statisticians who program a lot in C and R by no using Julia because apart from programming for pleasure they need statistical methodologies to produce their papers and often are not willing to implement everything from scratch.

Here in Brazil a programming language called Lua was produced. It is a great language that many Brazilians do not use. This linugagem was produced by Petrobras engineers. What I realize is that many use a language by what it has available and asymptotically the interest is built on these people to contribute by implementing.

I may be wrong, but perhaps incorporating mathematical methodologies from other areas will open doors for many workers to Julia language.


#18

I worry about this since there are initiatives to reimplementation of the base R by other companies that already use the R language for their analysis of data such as Microsoft, Google, Amazom, Oracle, Facebook, etc.

Link: https://mran.microsoft.com/open/


#19

To be more clear, making it possible to implement a useful R transpiler is extremely unlikely be be part of julia 1.0 or even 2.0, if ever. The reason is that our compiler is optimized for julia code. The semantic requirement of R and the R code that depends on these features are “bad code” by this standard since the semantics itself makes the code much harder to optimize statically. It is of course possible to optimize them to some degree with fancy JIT techniques but since those pattern will not appear from julia code and they are very hard to implement in general it’ll not be what we work on with any priority. For more information about the difference in the approach we are taking vs other scripting languages with a JIT added as an afterthought, see https://www.youtube.com/watch?v=cjzcYM9YhwA .

As for when it’s a good time to use julia, if you do not want to investigate any time in some level of internals (which is a perfectly valid choise) waiting for 1.0 is a reasonable choise. I do recommend starting now with some throw away scripts to get a feeling though. If you want ot use julia now, it’s almost certain that you’ll see some breackage before 1.0 releases though the magnetude of it strongly depends on your usecase (I’ve seen 1yr+ old code running just fine with minor depwarns and I’ve also seen a package broken multiple time in a single release cycle).


#20

3 posts were split to a new topic: Notes on the Julia computer (JIT vs static)


Notes on the Julia compiler (JIT vs static)
#22

While such altruistic motives are laudable, in the long (or not so long) run they usually lead to abandonned packages. Write something that you would use.

If, for example, betareg is such a method, then specifically

  1. read the vignette,
  2. understand the method (probably requires reading some of the cited papers),
  3. understand the numerical challenges (if any),
  4. refer to the source code only to clarify some points.

Iterate 1-4 until you have tons of nicely running unit tests. Then profile and document your code.

Note that 1-3 will not require looking at any R sources. The focus is on understanding, not transcribing.


#23

It does not matter the license. Transcription refers to mathematical functionality. Inclined, in this transcription process, corrections or improvements may be made, including to adjust to the proper use of the Julia language. There will be no code simulation. What will be there is similarity of the mathematical functionalities approached by the package.