Has anyone seen this blog?

I am trying to push my department to gradually use R to replace SAS and use Julia to replace Matlab. I have accumulated some R codes for my teaching in Investments (mainly tidyquant package) and Risk management (mainly QRM package) courses and I am also preparing my Julia codes for my future teaching in Derivatives Pricing. Due to funding cut in many public schools including mine, I think it is a good chance to make this move now. Our school has a guy serving in top expert committee in SAS corporation, which makes it difficult to change from SAS to R, but still I and some stats professors are pushing our department towards this direction.

Also, our department’s Matlab guys are retiring, I think it is a good chance to start using Julia. The issue about Matlab is that most of our students get stuck with issues of namespace and data manipulation, and we need to update Matlab license every year (and this happens in the middle of a semester!). A few years ago, I saw Fortran and Fortress could be an alternative, but now I think Julia seems to be a better choice.

Then this happened. I asked one of my colleagues to try downloading and installing Julia, and she searched “Julia Language” on google, and she found this blog:

http://zverovich.net/2016/05/13/giving-up-on-julia.html

Most of my colleagues are not programmers, and they believe this blog because it is ranked as the 8th research result of google. I am not sure how google makes this ranking, but I have to say using “Hello World!” to test speed is the most unbelievable thing I have ever seen.

I think Julia has some potential in finance, at least in finance education. If we teach C++, then we do not have time to get to deeper topics because it takes too much time for students to learn how to write correct C++ codes (for example, memory leakage happens when doing Monte Carlo simulation and we need to teach virtual destructors, etc.). On the other hand, Matlab is not open source and lacks tool in data manipulation.

1 Like

Pretty much none of it held in the first place, and by now it’s completely out of date. He measures Julia’s startup time instead of the actual coding time, which is obviously silly for so many reasons. Then he says the language is obviously wrong because of 1-based indexing. Anyone purely against 1-based indexing is making a baseless claim intuitiveness because it’s easy to find cases like this:

and usecols=[3,6] for the 4th and 7th columns

in 0-based languages. If those off-by-one errors are “intuitive”, that’s just Stockholm syndrome. Either choice is going to have an issue some times, but when working with real data and mathematical algorithms it’s far from obvious that subtracting/adding 1 from/to everything is clear.

Then, the testing API he complains about was completely replaced by Base.Test a few years ago. He quotes one person as evidence that Julia Base is hard to work on, and ignores that more people have contributed to Julia Base than have ever contributed to SciPy, and many Julia contributors are not “lifetime OSS people”.

In short, there aren’t many sillier things someone can write. There are things to be said about Julia (and any language), but this isn’t one of them. But, just like a Gawker article spreading rumors about a celebrity, if you write something silly enough it’ll get people to talk about it and it’ll get clicks. This gets passed around so everyone can comment about how wrong it is, and everyone comments on the page hoping that it’s possible to teach the author. But some people can’t be fixed and this only gives it more air time. So please don’t post the link because it’s the fact that it gets linked that makes it go up in page rank. In the “Age of Trump” we have to be a lot more careful in what we share on the internet, because otherwise the Alex Jones of every topic will dominate the search engines.

25 Likes

Good luck. Regardless of the technical merits (I like both R and Julia), every time someone has an issue with either, they will blame … you. :wink:

Very few people here who otherwise may easily write a few throusand LOC/week have “programmer” as their job title. They just program a lot. I seriously hope that anyone who is teaching a course that uses programming also belongs in this category. In which case they should be able to judge it for themselves — your best bet is to narrow the decision down to these people.

R is non-controversial: it is very mature and established. I would be more wary about Julia though at this point: with the wrong mindset, one can easily blame all kinds of failures on Julia (“I could not do my homework because …”), even though they could have been overcome with a bit of effort. Perhaps you could wait for v1.0, and until then use it in advanced classes (eg ones that need to solve PDEs or similar). Replacing C++ could be an immediate win though.

That said, I remember having taught a course to MSc students using v0.5. Despite the constant grumbling about plot times and having to restart the kernel all the time (in Jupyter), I think many of them liked it. Recently I talked to some of the students who went on to PhD programs, and they told me how useful it was to be exposed to Julia.

Also, you should convince your IT to set up a Jupyter server. It is not a big deal, and a great teaching tool for both R and Julia. You can also use it for problem sets and exams. It also eliminates a lot of possible problems with installation etc, students always have a fallback.

9 Likes

I cannot verify this, but according to one post on the Julia reddit, the author of the blog post was far from unbiased: Reddit - Dive into anything

It should also probably be noted that at the time this post was written, the author was working for AMPL, a company that sells a commercial product that used to have a near-monopoly in the academic operations research (aka mathematical programming aka constrained optimization) community. JuMP and Julia are very much disrupting that.

8 Likes

Is this regression because 0.7 is in development stage or is it real “progress”?

Julia 0.6

$ time julia -e 'print(1+1, "\n")'  # 10x slower than python3 
2

real    0m0.668s
user    0m0.652s
sys     0m0.276s

Julia 0.7

$ time ./julia --depwarn=no -e 'print(1+1, "\n")'  # 20x time slower than python3 
2

real    0m1.132s
user    0m1.072s
sys     0m0.208s

Btw python3 is 2x slower than python…

1 Like

You mean increased startup time? Try subtracting

$ time julia -e 'exit()'

Just curious do deal with large SAS datasets e.g. at least 15G in size?

Yes - and I responded to it point by point.
I think there was a lot of good constructive criticism in that blog post, but remember, it was written back in 2016, in May I believe.
I wish the post had a big banner at the top where the date it was written, and the version of Julia it was written about (v0.4), were shown.

Normally I agree with Chris, but in this, I do believe he was correct about a lot of it, although some of it was important only for his use cases (which he stated, was for short scripts, which has not been Julia’s strong suit so far).
I know a lot of people who write scripts in Python, but because I don’t want to keep too many languages in my head all the time, I write lots of utility scripts in Julia, and I do also notice the slow start-up time (but my productivity writing the scripts and maintaining them outweighs that issue).

One based indexing? When I first started with Julia, I was strongly against the idea that Julia should only have 1-based indexing and column-major arrays, there are many cases where 0-based (or arbitrary based) indexing can simplify problems, and row-major arrays can help with interoperation with other languages as well as being better for certain operations. That has changed due to Tim Holy, Matt Bauman, and others’ work in making array handling much more generic, with things like OffsetArray and PermutedDimsArray.

He also had issues with the C-style *printf macros, which I also totally agree with, however I was able to point out to him another solution :grinning:, my own StringUtils.jl package, which has since been superceded by my StringLiterals package.

Instead of complaining about what he saw, maybe it would be a good idea to reach out to him (in a very respectful and non-confrontational manner!), and see if he’d revisit his blog post, especially if he changes his mind about a number of the problems he saw back two years ago (it might not be time yet to do so though, maybe better to wait until v1.0 is in beta, and more of the issues he had have been addressed).
That way, when v1.0 comes out, and there is more announcements about it, and hopefully people searching for “Julia language” on Google, they will see that, a story about how somebody who had issues that were solved in a relatively short period.

8 Likes

I believe that this is true, but Victor attended several JuMP/JuliaOpt related presentations at the ISMP 2015 and was really friendly towards that community, I would say. So, I’m sure that his blog post just represents his personal disappointment, not some kind of professional sabotage. He know longer works for AMPL and is a prolific C++ programmer. Why not just learn from his criticism?

4 Likes

He also was very willing in the comment section to look at things like alternatives to @printf, which he hadn’t been aware of. It seemed to me that he did have an open mind about it.

:+1: Yes, instead of circling the wagons, better to look at what the critic’s pain points were, see if they were simply unaware of better solutions (such as the case with @printf), or if it is an area where more work needs to be done to improve.

1 Like

Most universities have high performance computers that can handle very large data with R (for example, in my school each node is 32GB, so data within 100GB is not big deal). Also, most finance departments have access to Wharton financial database which also offers unlimited cloud service, so you can retrieve, clean, and analyze TAQ data within their cloud using R or Python. In the worst case, if you want to handle data locally, sparklyr can easily handle 20G data.

1 Like

That seems pretty naive. Blogs are notoriously opinionated (which I’m not faulting), endorsing one based on it’s google ranking seems pretty silly.

Yes, you can usually massage what someone else wrote enough that it’s not crazy, but if you take what they wrote at face value it’s just wrong.

Yes, Julia has a slow startup time. That’s not what he says. He says his test is indicative of a general performance issue. His further evidence is linking to issues that completely misunderstand what the benchmarks are trying to measure. At this point the benchmarks say “fibonacci measures the cost of recursion”, and back then they had that in the captions and there were notes all about this all over the web. You can message what he said to “he identified a real pain point about startup time and JIT lag!”, and yes those are pain points, but what he actually wrote is still wrong.

Another thing wrong with the statement here is that the test is measuring a fixed cost with relative numbers (“~187x”) is simply not a sensible statement. If you take his same test and loop it a few thousand times (in a function) you’ll see that Julia is ~1-2x from C, with pretty much a fixed distance due to startup time. Clearly there’s a methodology issue if the same test in the same measurement gives a wildly different result when repeated a bunch. If you make this a statement about startup time, sure that’s totally true: Julia has a longer startup time than most languages. But does this mean that Julia is 187x from C for running most code? No…

A lot of people have reached out to talk to him, but he refuses to correct what he says even into the massaged versions that would make it correct. He hasn’t updated it with the changes in the language either.

2 Likes

I don’t disagree with you, but at the same time, many people do not look beyond the first few results, so they do have an influence. That said, above the aforementioned blog, I see this infoworld article, and below this article in Forbes, so people prepared to read a bit should be OK.

Finally, hopefully everyone who works with data/numbers will soon have a colleague who uses Julia within a radius of N meters (with N \searrow 0 corresponding to world domination), so they can just ask. IMO this is much more important than blogs or opinion pieces. No matter how nice a language is, one can always find someone who is not satisfied with it, and these get a lot of hits almost by construction in pagerank because the controversy brings in clicks.

I was unaware of that. That’s sad to hear!

1 Like

Look at the comments on the page. The issues with the benchmarks, the existence of non 1-based indexing, etc. have already been mentioned.

I wrote a lot of the early comments on the page :grinning:, however, at the time, there weren’t the things like OffsetArray and PermutedDimsArray to answer that issue of his.
I didn’t see that he later didn’t want to update it :disappointed:

He hasn’t responded to comments which say things like:

Julia 0.5 allows arrays with indexing starting at values different from 1.
The array types are expected to be defined in packages, but now Julia provides an API for writing generic algorithms for arbitrary indexing schemes.

That hits exactly what you said, and I think that if he hasn’t responded in a year then he’s not going to.

1 Like

Thanks for your reply, I think Jupyter should be a great tool for teaching programming.

I heard there are about 1 million Matlab users. Most of those Matlab users I know have no idea about version control, never go to stack overflow or other forums to ask questions (Matlab has 24/7 customer service), pay very little attention to coding style. I believe there will be a lot of Julia users like this in the future. All they want is to get things done.

If someone wrote a refutation of his claims from a blog that was well-publicized and widely read, linking to his blog (perhaps yours, Chris?), that would counteract to a significant degree the misinformation that is being spread from that source.

1 Like