Julia losing popularity among Data Science users (KDnuggets Software Poll)

Yeah, I second that. Obsessing over these things is not going to help anyone.

Well, here’s my two cents on this. “Data science” is a recently made up term. As far as I can tell, the vast majority of us working in the field aren’t actually specialized in anything related to our jobs (although certainly some are, e.g. NLP people), though we may be very highly specialized in terms of our educational backgrounds. In my experience this has led to wildly divergent opinions on even practical matters such as tooling.

At the risk of over-generalizing, I think there is a prevailing attitude in data science that anything that Python isn’t “good enough” for is not worth doing, you should mostly write scripts and not worry too much about re-using code. If that’s your view, I can see why Python seems like the ultimate tool: it has a staggering amount of pre-existing code available for it, and it’s a very fine scripting tool. I have to admit that, because of the way I learned programming and computing, those attitudes are extremely frustrating to me, but I also have to admit that they are perfectly valid in the majority of data science roles, and that people with this attitude are getting a huge amount of valuable work done without caring one bit about what I think about how they’re doing it.

As data scientists we are also frequently asked to work on data that is presented to us in some truly nightmarish formats, and we are constantly having to deal with awful things like csv’s and SQL (awful mostly because it’s actually about 100 different things masquerading as 1 thing). Therefore, there is (very rightfully) also a huge emphasis on tools for doing things like querying databases, and I suspect that probably a majority of data scientists when approaching Julia would be primarily interested in situations like what @Liso described above, where it was very correctly pointed out that we’ve sort of abandoned the idea of having a universal database API for Julia. This is a real issue. Database support is hugely important (especially if you’re a data scientist but also if you’re not) and it really kind of sucks in Julia right now. We might have some great support for some specific databases, and they’ve done a great job on ODBC.jl, but frankly there’s nothing as clean, simple and easy as sqlalchemy in Python, which must have taken a monumental amount of work to get it into its current state. If you are new to Julia and don’t know where to look for things like ODBC.jl, LibPQ.jl, JDBC.jl or MySQL.jl, things look much worse than they really are. There are also some people who would come in, perhaps learn about the packages I’d mentioned, but only see that there is no sqlalchemy equivalent and immediately dismiss the whole language.

My counter to all that is basically that it gets the priorities completely backwards thanks to the existence of things like PyCall and JavaCall. Yes, I need database support, but pulling data from a database isn’t really that complicated. If I have to do it through PyCall, and it’s a little slow, I don’t really care. You know what sometimes is that complicated? MILP’s with millions of variables, stochastic constrained optimization problems, POMDP’s, solving stochastic differential equations. When I first started using Julia, I had recently gotten really aggravated with the misery of trying to do large MILP’s in Python. Everybody around me was using PuLP. It was really, really slow and even uglier than it was slow. Then I wrote up a problem in JuMP. It was this tiny little thing that fit on one screen, and it looked almost exactly like it did when I wrote it out algebraically in LaTeX. It took 0 effort to convince the guy I work with who was working on these things for years longer than I have that we need to move everything over to JuMP. Now I’m getting ready to try a more general version of those problems that will require a stochastic method like simulated annealing (possibly using Hamiltonian updates like in @Tamas_Papp’s package?) and this would have been impossible in Python, because the updates are going to require a huge amount of custom code. In fact, the engineering group at my company attempted something like this once in Python using canned tools and, after being happy with a toy problem, wound up completely abandoning it because they couldn’t get it to work on real problems. Why? It was too slow, and too hard to modify. If I can’t get it to work on Julia, I can be confident that my tools aren’t the problem.

Lastly, as I’ve already talked about extensively elsewhere, some of the simple stuff in Julia is really so nice and it can be so hard to convince people of that if it’s built into the core of their being that they should never write a million iteration loop. I don’t want to do everything with some complicated API, let me just write simple code using Base. Can’t think of a way to do something that’s not all database operations? Fine! Just write some code, use a Dict, use a Vector, create your own custom struct and put it in a million element array, just do whatever you want, that’s how writing code works. I’ve recently had a data manipulation task that started out very simple and lo and behold, it turned out I had to do a whole bunch of stuff with quadratic time complexity that would have taken an hour in Python, or I would have had to go hunting for the right package to do it (if that were even possible). In Julia it was all really simple stuff. I use mostly Base for the vast majority of what I do, just like in Python you in principle could use the stdlib to do most of what you do, but you don’t because it’s too slow. I lose numpy the second I want to put a Python object into it, Python’s an OO language, it’s supposed to be all about writing objects, so then what good is numpy? (well, lots of good, just not for Python objects)

Nope, it’s never been an issue (perhaps surprisingly). So yay for that :smile:

The reason I’m so passionate about it that I wrote this huge rant on Sunday afternoon, and why I am “scared” of Python is that in a way Julia and I are in very similar situations in our career in data science. We grew up doing physics and we want to do something interesting enough that it’s more than just feature selection and canned solutions, and if we can’t find that there is no reason for us to be there. When that inevitably occurs, we will have to look elsewhere for jobs. I may not find one, but fortunately, Julia has several already.

17 Likes