What features will I miss in Julia?


#17

Yes, MixedModels is faster, often much faster, than the lme4 package for R. Many factors are at work here, not just the fact that Julia code will run faster than pure R code. Most of the time in an lme4 fit is spent in compiled code whereas MixedModels does not use any purpose-built C/C++ code (the linear algebra does end up calling OpenBLAS or MKL).

The real advantage of Julia is that I can experiment with the algorithm without sacrificing performance or needing to rewrite C/C++ code and interface code. So MixedModels is faster partly because of Julia, partly because of tools like the optimizers available in Julia and partly because the algorithm is cleaner.


#18

Yes, this is where I think opinions diverge. I would say this: as a language “for developers and expert users”, Julia is definitely the best bar none. The reason is because once you get the hang of Julia, you can just use Julia without needing resources. Julia is developed really cleanly, employs very little/no magic, and the vast majority of Julia Base/packages is written in Julia. I find that the vast majority of the time when writing Julia, I can “guess” (or actually, just know) what the compiler is going to optimize and how it’s going to do it. I just check Base code and package sources to see how everything works instead of checking docs (and send PRs).

This is a style of using a language is something I hadn’t ever experienced before (years of other languages, about 1 year of Julia). In MATLAB/Python/R I had to always use lots of documentation, and search StackOverflow for answers. In Julia it’s usually unnecessary (the only time it comes up really is for actual Julia bugs, and usually I get a Github hit for what it is). Using C was too far in the other direction: isolated and re-inventing not just the wheel but also wood and stones and it was too much time wasted.

So if there’s a language to get really good at, Julia is definitely the right choice. That said, it is still easier “to be a noob” with Python and R since there are more pre-packaged solutions and StackOverflow answers ready for you. But even in Python/R/MATLAB, if you dig past the basics say to S4 objects and investigating what the compiler is auto-optimizing in the background, you quickly enter an area that is beyond what’s documented and answered (some of it may not even be well understood…).

This should be negligible for most problems which are not games (games only because of graphics drivers).


#19

You can use Jupyter with Julia. It’s called IJulia, and has worked for quite some time now.


#20

You might be interested in Pandas.jl as a replacement for data.table.


#21

I don’t need to emulate data.table, just to have similar features.

What about some fast way to read/write files like data.table’s fread and feather or even fst, able to read and write compressed files quickly?

Or the ability to work easily with databases such as MonetDB?

What about the interaction with Spark?


#22

You could check on https://pkg.julialang.org and find the Feather package. Notice that it is owned by the JuliaStats organization, which is a good place to start looking for Julia packages related to statistical computing.

Also, check the recent announcement of JuliaDB. This is a step forward relative to other systems for manipulating and processing large amounts of tabular data. The purpose of Julia isn’t to imitate the capabilities in Python or R or Matlab and stop there. It is to design novel and effective software to solve real-world problems.

If you want to continue a discussion of the laundry list of capabilities that must be provided for you before you are willing to consider using a system, you may want to consider commercial software.


#23

What software do you mean?
I’ve moved from Stata and SPSS to R and I think is more capable and faster (with the proper libraries) than those.
I’ve never tried SAS.
I’m starting to learn Spark, but it’s much more difficult.
Or do you mean software such as Tableau or Qlikview?


#24

I was (indirectly, I admit) commenting on the tone of your messages. If you are going to spend a lot of money to buy commercial software it is reasonable to ask the salesperson, “Does it support this and this and this?”. Open-source software, on the other hand, is provided for you by its developers without charge. If you decide that you don’t want to make the slightest effort to learn about the software (e.g. not bothering to check for a package to read and write feather-format files, even when it is cleverly disguised as a package named Feather) that’s your choice. But don’t expect people to spend a lot of time trying to convince you to use the software. To tell you the truth, we don’t care if you use it. We write it for us to use it.


#25

Yes, I want to make the effort to learn the new language. But first I’m trying to get the whole picture and evaluate if it’s worth the effort or if I’d rather need to move to something else.


#26

To be fair, the only thing about Python that I miss is the debugging (I suspect this would be the same for R). Since Python and R are interpreted languages, making good debuggers for them is trivial. There is a Julia debugger but it’s in purgatory right now due to compiler changes in v0.6. However, Julia has excellent and recently improved stack traces and quite a powerful REPL, so the situation is about as good as it can get without a proper debugger.

One of the many huge advantages of Julia which @ChrisRackauckas touched on but isn’t often advertised is that it’s usually trivially easy to figure out how software works. This is largely a consequence of the well-advertised fact that performant code can easily be written in pure Julia without resorting to calling Cython or some other hare-brained scheme. By comparison, figuring out how Python code works is an absolute nightmare. I used to use pandas very extensively and I never gained the slightest clue of how it works.

One of the problems with the Julia ecosystem right now is that the dataframes implementations are split. I have found this to be a non-issue because it’s trivially easy for me to figure out how those dataframe implementations work and solve any problems I might encounter due to their immaturity.

Also, I use Feather all the time, and I love it.


#27

For someone who had spent a lot of time optimizing code in various assembly languages, from 8-bit to 64-bit, I really appreciate @code_native (and the others in the family, such as @code_llvm), it makes it much easier to see just what is going on, and if making some source code change I can immediately see just what effects it has on the generated code.
I’m not aware of anything like that for C, C++, Java, Swift, etc. (but there might be some such thing, does anybody know?)


#28

I’m not aware of anything close to this in MATLAB/Python/R. There, the internals and any optimizations that may be applied are treated as dark magic. That makes it difficult to really know what’s going on sometimes. For example, there are claims that in some cases MATLAB may fuse A.*B.*C, and sometimes you can time it and it seems like that happens, but… you can’t see what the code actually does (nor are MATLAB’s internals+compiler something you can just go look at, for obvious reasons)


#29

I know that now I’m totally abusing the subject of this thread but while we are on this low-level stuff (I just couldn’t resist)

julia> π_float = Float64(π);

julia> π_ptr = convert(Ptr{UInt8}, pointer_from_objref(π_float));

julia> π_bytes = unsafe_wrap(Array, π_ptr, 8)
8-element Array{UInt8,1}:
 0x18
 0x2d
 0x44
 0x54
 0xfb
 0x21
 0x09
 0x40

How awesome is that? Try doing that in R or Python or pretty much anything else other than C or Julia. This isn’t just some specific implementation that wraps some C code that you’ll be able to use only for this specific purpose, this is something you can do with anything. You can do anything you could in C in pure Julia without any special syntax or constructs.

And just in case you’d think this could never be useful, take a look at Feather.jl. That was written in pure Julia. It didn’t need to wrap a C library. It didn’t need to wrap a big, complicated Python library that wraps some big, complicated C library. If it did either of those things I’d never have made a PR, I’d have said “Well, it doesn’t do what I need it to do, learning those libraries it calls would take a really long time, I guess I have to use something else.”. Instead, it’s only 500 lines of code, and very easy to understand, so I was able to modify it in a way that has been very useful to me in real life for my real job.

By the way, my example is still unsafe after the last step, I actually don’t remember off the top of my head how to convert it to a proper managed reference, lol. Perhaps someone can extend my example.


#30

as i worked with python struct (https://docs.python.org/3/library/struct.html) more than 10 years ago to read/write structured binary files, i’m only medium impressed…


#31

Speaking of which, saw this today:


#32

R is really comprehensive for the basic batteries-included stuff. If you’re throwing some data into a data frame, plotting and data munging, and R is working for that, you should stick with it. Some progress has been made on DataFrames.jl over the last few years, but mostly on working out the fine details of the design, so it’s not there yet.

Of course, if you hit problems with R it’s very likely that Julia can solve them for you. Just want to set expectations.

You might also miss some of R’s more exciting language features, like implicit laziness and the superassignment operator, which assigns variables into the calling function (reading about R always reminds me of Intercal). These are mostly really horrible for writing maintainable code, but a lot of the Hadleyverse uses them for really nice APIs over dataframes; Julia struggles here a bit and the macro-based versions are ugly.


#33

In my experience, these APIs are really nice if you want to perform one of the specific tasks they were designed for, but turn out to be very brittle if you want to go beyond that. When that happens, you either learn about R arcana and dig into these libraries, search the net in case someone faced the same problem and hope their solution still works, or give up and write a loop.


#34

Julia is also a tiny bit closer to a compiled language in which you write your code, wait for compilation, then run it than the instant loop you have in R. Compilations times can become an issue when writing/using a lot of code, which forces you to write modules. That in itself can be a good thing but it’s sometimes a bit annoying.

The workspace/namespace management is also more finicky, specially at the moment because of the workspace bug. This also get worse because if you restart Julia you have to reload/recompile.

It’s still great for hands-on data analysis but it feels a bit sluggish at times.


#35

I would write this as:

julia> reinterpret(UInt8, Float64[π])
8-element Array{UInt8,1}:
 0x18
 0x2d
 0x44
 0x54
 0xfb
 0x21
 0x09
 0x40

Or even just this:

julia> bswap(reinterpret(UInt64, Float64(π)))
0x182d4454fb210940

#36

FYI, it is not allowed to unsafe_wrap a object reference.