Adding SAS to benchmark comparison

jacobcvt12 · February 21, 2018, 4:06pm

Many of my coworkers use SAS (gross!) and aren’t familiar with the other languages listed in the benchmark comparison. It would be nice to add SAS to the comparison to persuade corporate users of the benefits of Julia. Any thoughts on whether this is possible?

bramtayl · February 21, 2018, 4:09pm

Might be useful. I’d much rather see a new set of tabular data benchmarks comparing R, SAS, Stata, Julia etc. on a set of common dataframe operations (grouped summarize, spread, etc.).

jacobcvt12 · February 21, 2018, 4:21pm

Agree that tabular data comparison would be nice as well. I’m not sure where that benchmark would go though. Do you think that the DataFrame operations and Julia are at a point where Julia would compare favorably to these other languages?

StefanKarpinski · February 21, 2018, 4:30pm

A necessary condition for benchmarking comparisons is having all of the different software installed and running on the same system. Given that SAS is proprietary and quite expensive, it seems unlikely that we’re ever going to be able to include it in the benchmark results, so I’m not sure there’s much point to having benchmarks for SAS since we won’t be able to run them. Is there much reason to believe that SAS will be much different than other slow, interpreted, high-level dynamic languages (Python, Matlab, R, etc.)?

bramtayl · February 21, 2018, 4:40pm

Yes, SAS is built to be fast for common data operations, and, though less flexible, is light-years faster than R.

bramtayl · February 21, 2018, 4:42pm

Do you think that the DataFrame operations and Julia are at a point where Julia would compare favorably to these other languages?

I think that’s up in the air. R + dplyr and R + data.table are very fast, and so is SAS (when doing the kind of things they are good at, tabular data operations).

StefanKarpinski · February 21, 2018, 5:30pm

My understanding is that SAS’s performance is not due to an exceptional language implementation but rather to high-quality, out of core runtime libraries. Since our benchmarks are explicitly designed to test the language implementation itself it doesn’t seem likely that SAS would be exceptional here but if someone wants to implement SAS benchmarks and run them together with C and Julia somewhere it could turn out to be of interest.

bramtayl · February 21, 2018, 5:43pm

I guess what I’d like to see is a table package performance benchmark. R, R + dplyr, R + data.table, SAS plus its runtime libraries, Julia + query, and Julia + dataframesmeta, etc.

StefanKarpinski · February 21, 2018, 5:45pm

That is a much more interesting comparison.

tbeason · February 21, 2018, 6:11pm

I don’t think SAS is really an appropriate comparison language for the benchmark on the homepage. Now, there is a growing consensus that other benchmarks are needed, one of them being data science related. There is some good work going on in SASLib.jl to get .sas7bdat files imported quickly, and there is also some good work going on related to sorting algorithms, for example. In both cases, comparisons to Python and/or R are being made when possible, but there still remains a need for a well thought out and executed suite of benchmarks that puts the different ecosystems to the test along several dimensions.

xiaodai · February 21, 2018, 8:23pm

I just recently demonstrated how I can perform group by 60x faster in R using fst and my disk.frame package vs SAS.

SAS is actually slow because everything is disk-based. Yes you can load data into memory but it’s clunky and doesn’t yield performance gains soemtimes. Also its primary data format SAS7BDAT is row-oriented so every operations requires some row-by-row logic and it cannot benefit from columnar operations.

How did I get performamce of 60x? I use the fst format to load only the columns I need, not every column unlike in SAS and I process in parallel using all cores. Julia’s JuliaDB.jl, Python’s Dask, and R’s disk.frame can take on SAS for large data processing. Then we just need out of core algorithms for ML. OnlineStats.jl will provide many of them, JuML.jl has a promising algorithm implemented.

SAS is going the way of COBOL. Many people are not complaining, because who wouldn’t want $300k/pa for being good at programming with it. Here in Australia top SAS freelancers command AUD$1500-AUD$2000 a day.

Topic		Replies	Views
Julia (with compile time) about 5.6 times faster than SAS Performance	23	2429	June 28, 2020
A living post of Julia vs R's data manipulation tasks speeds Data data	21	7776	August 27, 2021
DataFrame sort Performance using Query.jl vs SAS PROC SORT Performance query , sort , dataframes	19	1528	August 11, 2023
Material for discussing differences/advantages between Julia, R, Python, Matlab, and C Community question	4	2013	April 18, 2017
Julia performs poorly on group-by benchmarks Data performance	48	5787	January 23, 2019

Adding SAS to benchmark comparison

Related topics