What features will I miss in Julia?

I don’t need to emulate data.table, just to have similar features.

What about some fast way to read/write files like data.table’s fread and feather or even fst, able to read and write compressed files quickly?

Or the ability to work easily with databases such as MonetDB?

What about the interaction with Spark?

You could check on https://pkg.julialang.org and find the Feather package. Notice that it is owned by the JuliaStats organization, which is a good place to start looking for Julia packages related to statistical computing.

Also, check the recent announcement of JuliaDB. This is a step forward relative to other systems for manipulating and processing large amounts of tabular data. The purpose of Julia isn’t to imitate the capabilities in Python or R or Matlab and stop there. It is to design novel and effective software to solve real-world problems.

If you want to continue a discussion of the laundry list of capabilities that must be provided for you before you are willing to consider using a system, you may want to consider commercial software.

7 Likes

What software do you mean?
I’ve moved from Stata and SPSS to R and I think is more capable and faster (with the proper libraries) than those.
I’ve never tried SAS.
I’m starting to learn Spark, but it’s much more difficult.
Or do you mean software such as Tableau or Qlikview?

I was (indirectly, I admit) commenting on the tone of your messages. If you are going to spend a lot of money to buy commercial software it is reasonable to ask the salesperson, “Does it support this and this and this?”. Open-source software, on the other hand, is provided for you by its developers without charge. If you decide that you don’t want to make the slightest effort to learn about the software (e.g. not bothering to check for a package to read and write feather-format files, even when it is cleverly disguised as a package named Feather) that’s your choice. But don’t expect people to spend a lot of time trying to convince you to use the software. To tell you the truth, we don’t care if you use it. We write it for us to use it.

22 Likes

Yes, I want to make the effort to learn the new language. But first I’m trying to get the whole picture and evaluate if it’s worth the effort or if I’d rather need to move to something else.

1 Like

To be fair, the only thing about Python that I miss is the debugging (I suspect this would be the same for R). Since Python and R are interpreted languages, making good debuggers for them is trivial. There is a Julia debugger but it’s in purgatory right now due to compiler changes in v0.6. However, Julia has excellent and recently improved stack traces and quite a powerful REPL, so the situation is about as good as it can get without a proper debugger.

One of the many huge advantages of Julia which @ChrisRackauckas touched on but isn’t often advertised is that it’s usually trivially easy to figure out how software works. This is largely a consequence of the well-advertised fact that performant code can easily be written in pure Julia without resorting to calling Cython or some other hare-brained scheme. By comparison, figuring out how Python code works is an absolute nightmare. I used to use pandas very extensively and I never gained the slightest clue of how it works.

One of the problems with the Julia ecosystem right now is that the dataframes implementations are split. I have found this to be a non-issue because it’s trivially easy for me to figure out how those dataframe implementations work and solve any problems I might encounter due to their immaturity.

Also, I use Feather all the time, and I love it.

4 Likes

For someone who had spent a lot of time optimizing code in various assembly languages, from 8-bit to 64-bit, I really appreciate @code_native (and the others in the family, such as @code_llvm), it makes it much easier to see just what is going on, and if making some source code change I can immediately see just what effects it has on the generated code.
I’m not aware of anything like that for C, C++, Java, Swift, etc. (but there might be some such thing, does anybody know?)

4 Likes

I’m not aware of anything close to this in MATLAB/Python/R. There, the internals and any optimizations that may be applied are treated as dark magic. That makes it difficult to really know what’s going on sometimes. For example, there are claims that in some cases MATLAB may fuse A.*B.*C, and sometimes you can time it and it seems like that happens, but… you can’t see what the code actually does (nor are MATLAB’s internals+compiler something you can just go look at, for obvious reasons)

I know that now I’m totally abusing the subject of this thread but while we are on this low-level stuff (I just couldn’t resist)

julia> π_float = Float64(π);

julia> π_ptr = convert(Ptr{UInt8}, pointer_from_objref(π_float));

julia> π_bytes = unsafe_wrap(Array, π_ptr, 8)
8-element Array{UInt8,1}:
 0x18
 0x2d
 0x44
 0x54
 0xfb
 0x21
 0x09
 0x40

How awesome is that? Try doing that in R or Python or pretty much anything else other than C or Julia. This isn’t just some specific implementation that wraps some C code that you’ll be able to use only for this specific purpose, this is something you can do with anything. You can do anything you could in C in pure Julia without any special syntax or constructs.

And just in case you’d think this could never be useful, take a look at Feather.jl. That was written in pure Julia. It didn’t need to wrap a C library. It didn’t need to wrap a big, complicated Python library that wraps some big, complicated C library. If it did either of those things I’d never have made a PR, I’d have said “Well, it doesn’t do what I need it to do, learning those libraries it calls would take a really long time, I guess I have to use something else.”. Instead, it’s only 500 lines of code, and very easy to understand, so I was able to modify it in a way that has been very useful to me in real life for my real job.

By the way, my example is still unsafe after the last step, I actually don’t remember off the top of my head how to convert it to a proper managed reference, lol. Perhaps someone can extend my example.

4 Likes

as i worked with python struct (struct — Interpret bytes as packed binary data — Python 3.12.0 documentation) more than 10 years ago to read/write structured binary files, i’m only medium impressed…

Speaking of which, saw this today:

https://twitter.com/CookieSci/status/863552730765430784

2 Likes

R is really comprehensive for the basic batteries-included stuff. If you’re throwing some data into a data frame, plotting and data munging, and R is working for that, you should stick with it. Some progress has been made on DataFrames.jl over the last few years, but mostly on working out the fine details of the design, so it’s not there yet.

Of course, if you hit problems with R it’s very likely that Julia can solve them for you. Just want to set expectations.

You might also miss some of R’s more exciting language features, like implicit laziness and the superassignment operator, which assigns variables into the calling function (reading about R always reminds me of Intercal). These are mostly really horrible for writing maintainable code, but a lot of the Hadleyverse uses them for really nice APIs over dataframes; Julia struggles here a bit and the macro-based versions are ugly.

In my experience, these APIs are really nice if you want to perform one of the specific tasks they were designed for, but turn out to be very brittle if you want to go beyond that. When that happens, you either learn about R arcana and dig into these libraries, search the net in case someone faced the same problem and hope their solution still works, or give up and write a loop.

2 Likes

Julia is also a tiny bit closer to a compiled language in which you write your code, wait for compilation, then run it than the instant loop you have in R. Compilations times can become an issue when writing/using a lot of code, which forces you to write modules. That in itself can be a good thing but it’s sometimes a bit annoying.

The workspace/namespace management is also more finicky, specially at the moment because of the workspace bug. This also get worse because if you restart Julia you have to reload/recompile.

It’s still great for hands-on data analysis but it feels a bit sluggish at times.

I would write this as:

julia> reinterpret(UInt8, Float64[π])
8-element Array{UInt8,1}:
 0x18
 0x2d
 0x44
 0x54
 0xfb
 0x21
 0x09
 0x40

Or even just this:

julia> bswap(reinterpret(UInt64, Float64(π)))
0x182d4454fb210940
1 Like

FYI, it is not allowed to unsafe_wrap a object reference.

Juno, do you have it?

1 Like

No, I’m curious, what features exactly would he miss? The only things I can think of is the inbuilt markdown help viewer and in particular the history tab.
But juno has the editor, the console, the plot pane, the workspace pane, the file tree, the git integration, the indispensable module namespace management, the debugger (though this is truly rudimentary compared to RStudios that is hardly a Juno matter), the linter, some nice buttons, and is far more flexible and extensible.

I tried Juno but I think I prefer Jupyter or even better Beaker.

So, to summarize our discussion, the OP may miss RStudio (for undisclosed reasons), or he may not.
To be fair, I agree with What features will I miss in Julia? - #32 by MikeInnes that Julia is a lot less mature than R, and I currently myself recommend it only to enthusiasts (though I am sure that will change in the future).