Extremely slow execution time

I am totally OK with the current status of compilation time, and I think plotting issue is not a big deal. I do feel a little bit upset that data manipulation tools in Julia are not as elegant and concise as dplyr and pandas.

I tried it a couple months ago, and it worked as expected.
Don’t overwrite your old system image if you’re worried.

You asked “When we can hope it will be true?”
I can’t give you a time when everyone will feel comfortable removing the warning, or there is broad (and better) support for more deeply precompiling packages, but that’s a clear example of a pure Julia solution already on its way, available for testing.

I am somewhat the opposite. I do think that plotting is a big deal, and there’s no dancing around the fact that it’s in a bad state right now, but I feel pretty confident that the situation will improve and there are various ways of mitigating the pain right now, as we’ve been discussing. On the other hand, I think the data manipulation tools are in a really excellent state right now. I’m very happy DataFrames.jl, and I like DataFramesMeta.jl and Query.jl, although frankly I rarely have to use them thanks to join, by and filter and best of all the fact that in Julia there’s no reason not to just iterate over your datasets and do whatever the hell you want with normal code. I’ve been quickly loading datasets with the new version of Feather.jl (which we will tag some point after 1.0). I haven’t started seriously using IndexedTables or JuliaDB yet, but so far I like what I’m seeing. I’m infinitely happier than I ever was with the unwieldy, over-complicated, unhackable pandas.

4 Likes

That’s very interesting to hear! It’s the first criticism I recall of pandas (and I’m glad if you feel the stuff being worked on now for Julia is better!)

1 Like

Frankly I think you will hear a lot of criticism from pandas, but it probably depends somewhat on the background of the people using it. I was brought up writing FORTRAN and C++ code for physics applications, both theoretical and applied. For me the idea of being discouraged from writing loops to implement algorithms that have non-constant time complexity is the stuff of nightmares. Don’t tell me not to loop over the rows of a table, I should be able to do that if I damn well feel like it, a huge class of data manipulation tasks just don’t require anything as complicated as a relational database operation. I shouldn’t have to dig through oodles of pandas documentation to do some dead-simple thing that can be accomplished with stdlib functions.

I know that many of my colleagues with more of a stats and data science background feel quite differently. They tend to take the attitude that anything you can’t do efficiently in pandas is not worth doing.

But then sometimes, you have to do that weird, obnoxious thing. You know, that thing that you’vebeen screwed into doing because somebody handed you a dataset in an unfathomably bizarre format. Then you either have to find a way to do it all in pandas, or just write normal code. But the normal code in Python takes an hour. But in Julia it takes 1 minute. And then I win :smile:

(Sorry, I’m pushing this thread waay off-topic now.)

2 Likes

I definitely recognize the points you raise. My current efforts are some small tests to begin to understand Julia and its current implementation. The offending code is designed to run a few small test cases for what I intend to eventually be a Package. A real world usage though would be for a user to provide a simple input, with the application then generating a series of plots to understand the predicted physical behavior expected for the quantities. However, the 15 sec lag time would drive users crazy :sunglasses:

As noted I’ll explore some of the ideas posted here today when I get a few moments to do so. Thanks again!

I should also note my recent coding has been a mix of C++, Mathematica, Ruby and Python.

I understand that Gaston, with its focus on simplicity and speed, and its reliance on gnuplot, is not for everybody. Having said that, I welcome all issue reports, suggestions and (especially :slight_smile:) PRs. I know that it works very well for my use case: quick exploratory printing on screen or on Jupyter, and PGFPlots (which nobody can touch for publication quality plots) once I want PDFs.

1 Like

I think those were more the type of people who I’d talked to (at ODSC conferences, for example), not the sort like Julians tend to be, who want expressiveness, ease of use, and the ability to expand/extend/hack something to their heart’s desire :grinning:

You are happy with Feather.jl though, even though that’s partially from the architect of pandas?

I definitely did not mean to imply that I have any animosity toward pandas or its creators. I think pandas is quite well made for what it is, but it does have one fatal flaw: it’s written for Python. Therefore, there is no way around pandas being unwiedly, over-complicated and unhackable, those things just come from it being written in C (well, Cython) and used from Python. The pandas creators seem to have done a good job at making the best of a bad situation, but with Julia available, there’s no need for me to do the same.

As far as Feather goes, it really is a very basic format, there’s not too much to fault there. I think my one major criticism is that they seemed to have sort of “made up the standard as they went along”, there are a few cases where they violate the Arrow standard and things get a bit weird. Still, because it’s such a simple format, this doesn’t seem to have resulted in any disasters. I completely rewrote Feather.jl so that it makes extensive use of memory mapping and lazy loading, so you can use it together with DataFrames, DataFramesMeta.jl and Query.jl as sort of a quick mini-database. We’ll tag the new Feather sometime after 1.0, but I’ve been using it extensively in my work and I’ve been pretty happy (I’m biased, of course, since I rewrote it). I’ll announce it here on discourse when I tag it.

There’s a bit of a hidden culture war in data science I think. It’s very painful for me to admit that “anything that Python isn’t good enough for is not worth doing” could ever be an acceptable attitude, but I’m forced to concede that there’s a huge amount of work for which this is perfectly sensible. Most of the stuff people are doing these days just boils down to feature selection. Still, there’s no reason that Julia can’t displace Python for most of those tasks, and for myself, despite the job title “data scientist” I tend to work more on topics that are traditionally considered operations research, so even in my current job the advantages of using Julia over Python are many and prominent. Please don’t take this paragraph as exacerbating said culture war, I’m actively trying to avoid the “us vs them” tone.

1 Like

:grinning: Precisely how I’ve felt for the last 3 years!

I hear you. I dislike culture wars, and try to see the best in anything (or anybody, any culture, any cuisine), and prefer to assimilate the best into my programming or my life.
When I encounter a new programming language, like when I found out about Julia, I’m always thinking of 1) how many of my favorite things does it already have, 2) how easy would it be to add the things that are not present, and 3) how will it endure over the long haul.
Julia already had many things I liked from Scheme, CLU, C, Python and Lua, I saw that it would be much easier to add the things I felt were missing than in other languages that I’d used (even compared to Scheme), and finally, it looks like it’s definitely here for the long haul.

1 Like

Look! If we want to have quick plot we have to use libraries written in C (or whatever what produce machine code) and pandas could use python (in principle :wink: - although it is cython (*) (which with type annotation could probably convert to pure python code in the future)).

So what we have in Julia is big promises.

Well! We believe in bright future of Julia but have to realistically see current situation too!

(*) - I didn’t check if pandas is witten in cython - I believe you.

FWIW, all my plotting problems have been solved by PGFPlotsX. This is because at the end of the day, all my plots end up in LaTeX documents anyway.

I realize that this is not a general solution for everyone, eg someone who wants interactive 3D plots for data visualization/exploration would need to use something different, but it has minimized the pain associated with plotting for me.

2 Likes

Also last time I checked Gaston.jl was the only plotting library working with current master!

Yes, very true. That’s why, although I’m probably one of Julia’s biggest fans, I’ve also spent a lot of time trying to shine some light on things that I feel could be improved (which sometimes gets people quite defensive).
The only thing I can do about that is to attempt to improve the parts that bother me myself (which, because it is programming in Julia, is not something tiresome, but rather something quite enjoyable and fulfilling)

3 Likes

I am sorry, but you understand two language all wrong. Two language problem is when a developer needs to use two (or more) languages to solve a single task, which is a common problem in scientific computing and other fields. The example with latex is irrelevant here, because julia and latex solve essentially different tasks. Julia does computations, latex does text markup. Solving two language problem does not mean solving all possible problems in the world.

When we can hope it will be true?

I guess, after you define a set of all possible problems in the world.

3 Likes

From where I sit the current situation is fantastic, with the exception of some obnoxiously long compile times.

I don’t see Julia as some pipe dream, I’ve been using it every day for years now, and at this point I rarely have to resort to using resources from other languages. (Ironically I have to use C, C++ and Java resources from Julia far more than I have to use Python.)

It definitely has solved the two language problem for me, I haven’t had to write any C code in 3 years now (after Tony Kelman, Steven G. Johnson and a few others pushed me from trying to solve some problems by writing a C library and a Julia wrapper, to doing it in pure Julia :grinning: )
I originally resisted, until after extensive benchmarking, I was able to show to myself that I could get code as fast as C (and this was for low-level programming), with less work, by writing it all in Julia.

I tell people, when they say things like “One Language To Rule Them All”, that they misunderstand the original quote, and that Julia isn’t the language that will replace all the others (I don’t think that would be possible for any language), however, for many things, it could be the language that will bind them all together (hopefully not in an evil way like the One Ring!) via PyCall, RCall, Cxx, JavaCall, and the built-in ccall, and therefore rule them :wink:

2 Likes

If you really need to get going while enjoying all the benefits of Julia along with the maturity of other more established libraries (plotting), just do a system call and pass a JSON file to another program. With a couple of lines, you can just use matplotlib in Python by piping your JSON to it. I am not a fan of the “XCall” family solution personally so I tend to design my solutions regardless if there are existing bridges or not, for me, it’s safer and more robust (always works), I can do it with any language, and often efficient enough (usually the overhead of the sys call does not matter as the computations happen in between on either side). Good luck.

2 Likes

Additionally, you can always setup with little effort HTTP/TCP microservers too as necessary for a more streamlined approach. This really allows you to tackle your problems making use of good parts of whatever you’re working with (Julia, etc) while allowing the ecosystem to develop and keeping your project from being affected by it.

2 Likes

I would have been very happy with 20 seconds. In a Pluto notebook “using Plots” took 973 seconds for me. I really want to learn Julia, but I don’t know how much more of my time I can afford to waste.

1 Like