Julia losing popularity among Data Science users (KDnuggets Software Poll)

Tamas_Papp · June 6, 2018, 6:31am

Not having a debugger made me

break up my code into small, testable functions that do one thing (ideally),
write unit tests for them as I code.

This is very natural and painless with Revise.jl.

Many scientists just write code, try to run it, and start the debugger if it “breaks” (which may mean anything from runtime errors to nonsensical results when eyeballing plots). I certainly did it all the time. The problem with this workflow is not catching bugs that don’t give apparent errors.

I am not saying that a debugger is useless, but I came to the conclusion that when I feel like I need one to understand what my own code does, that is almost certainly code smell.

Liso · June 6, 2018, 3:06pm

Although I agree that debugger is not so necessary for experienced coder, it is very good for somebody who is learning language and/or coding.

jebej · June 6, 2018, 5:05pm

Just use PyPlot directly. You can open new windows with figure().

DIzer · June 8, 2018, 9:38am

Hehe…and what? what do you want from the users of the language with unstable api, infrastructure, enviroment, tooling…with its release date regularly postponed in the last couple of years. Did you realy expect to see flock’o users making their vital works with julia (even in the near future)? If so you are hmm very optimystic. That things none could see even among rusticians… though their approuch (to design, testing…) and funds are MUCH MORE SOLID. Somebody over there states about fluctuating of the voting value…in fact the value is fluctuation itself.

Joshua_Bowles · June 9, 2018, 11:50pm

Good points. In my view of history it was adoption of python by academics that catalyzed student usage at University. Many of those students went into the business marketplace and established usage there. Eventually the academic development for research coupled with student exposure and migration into marketplace established “data science”.

It seems Julia adoption in academia could have the same effect. But I think the real catalyst will be necessity. The size and complexity of data for small and medium sized companies needed to stay competitive will force movement off an inherently constrained python runtime. I also think the speciization of scientific and mathematical techniques will only deepen in the marketplace.

Of course I feel this way about modern dev in general. With adoption of go, rust, elixir preferred over ruby python nodejs. So I’m a bit biased towards the “new school” of programming, of which I see Julia part of.

Joshua_Bowles · June 10, 2018, 12:28am

prevailing attitude in data science that anything that Python isn’t “good enough” for is not worth doing, you should mostly write scripts and not worry too much about re-using code. If that’s your view, I can see why Python seems like the ultimate tool: it has a staggering amount of pre-existing code available …

I think you are on the money with everything you said.

My view is that much of what you said falls out from management not used to working with complex data and analysis, and undervaluing investment in stable data pipelines and infrastructure. Analysis and reporting is a “cost center” in many small/medium businesses. They just want some quick input to inform or support decisions related to the biz. With the creation of “data science” coupled with popularity of the cottage industry flooding the" python data science" market with people who have a couple months training and can be paid more cheaply now due to “supply” theres likely to be a lot of “lowered expectations”. Prime territory for businesses with proper tools and training to run circles around others.

That is, businesses that have stable, fast, flexible, maintainable, hackable data systems will win. That won’t happen without good tech leadership at the top.

Julia seems prime for this. Wide adoption and kdnuggets headlines aside, it only takes a few really successful mainstream business projects to happen before others will take notice.

ScottPJones · June 10, 2018, 2:39am

I think we’ve identified some of the pain points for data scientists wanting to use Julia, here in this thread, now we need to go off and try to fix some of those, so that new data scientists won’t try it out once, get burned, and starting writing blogs about their bad experiences with Julia.

austin-putz · June 15, 2018, 12:23pm

I agree. I see you are having some push back on this comment. Someone got really offended by my comment about atom before so I’ll refrain. But I checked with many others in my field and they find it as terrible as I do. RStudio would be an A in my mind. Always works great. Looks great. No issues. Can work with vim easily. Working in atom is about a C-, I prefer to just work on the command line it’s so bad. Sorry for all of you that think it’s amazing. I’ve had lots of problems and just don’t like it.

austin-putz · June 15, 2018, 12:51pm

I agree. I’ve had many in the biological sciences try Julia and have so many issues they give up after a week.

I’ve written on here before and everyone got mad and said how easy Julia is. Julia is not easy for people in the biological sciences. I found R fairly hard when I learned it compared to some easier statistical languages. When I started learning python I thought it was quite easy and very similar to R. Looking at a few things you would think Julia was that easy… but not so fast. I’ve looked into other’s code and I don’t understand a thing at times. The problem I see is that the audience was not for base scientific R programmers. It was for people coming from C++, Java, FORTRAN, etc. Python users are typically a lot more advanced. Julia will be able to get a larger piece of that pie. But the vast majority of R programmers I know (just me in a biological field of animal science) will never move to Julia because of it’s difficulty. I find trivial things in R extremely difficult to find solutions to. I simply use Julia for large scale file processing (genotype files of a few GB, but can get much larger). There I only need to do some simple matrix calculations. For day to day data science, it won’t be grabbing the R community unless it gets simpler and easier to use with A LOT better documentation. I’m finally finding some better docs, but the open source community is lazy and don’t like to write documentation because we don’t get paid like SAS. The docs you all have now for Julia are for developers. Not the dipshits like me we have in animal science fields or agronomy or life sciences. We are making up a large part of the user group. You guys are all higher end people coming from many other languages and I’d venture to say like 99% of you know C++, C, Fortran, or Java (??). Biological science people know bash scripting (maybe), SAS, and R.

Please don’t get mad and go off about how wrong I am and how easy Julia is. I’m speaking for my group in animal science that have tried Julia. They hate it and can’t ever get it to work. I understand it’s “pre 1.0”, but they don’t know that. So many have tried and now tell everyone it sucks. That’s why it’s not going to be gaining soon. So again, my advice is STOP talking about it anywhere, like RBloggers until you guys can get it stable in a few years (I’m still not trusting of 1.0). In a few years, when it’s read and has the database stuff I saw above and a decent IDE, then come out and tell your story. Right now it’s too early and too much noise is coming from Julia and it’s still a baby. So just cool it and let the language develop. But I’m seeing too much stuff and hype and it’s leading to people coming over, getting pissed, and leaving and telling everyone how bad Julia is. I’ve seen this many times first hand here and people I know refuse to try it again because of their experience.

Take this for what it’s worth. Sorry if I offended you. You guys seem quite sensitive. Just trying to make Julia as good as it can be.

pdeffebach · June 15, 2018, 3:26pm

Could you write up some of the specific gripes you see frequently about the data ecosystem? I use R for datascience and am consistently frustrated by problems I don’t run into in Julia. DataFrames and DataFramesMeta seem much better designed, to me, than tibbles. We can’t work to improve the ecosystem without more specific examples.

pfitzseb · June 15, 2018, 6:15pm

If you think there’s something we can do about that then please do elaborate a bit on what you don’t like – from the few videos I’ve seen of RStudio it does seem llike you could have a very similar workflow in Juno.
And if you mainly had issues with stability then I agree that we still have a way to go in regards to that, but there appear to be quite a few people who don’t seem to have that many issues with running Juno.

Also give VSCode a go.

In general you’re probably right though – Julia, it’s docs and it’s tooling is mostly build for developers right now and not for end users, but that’s still great for them since they are inevitably going to use the libraries written by early adopters.

DoktorMike · June 15, 2018, 6:20pm

I agree with the most parts of your analysis. I disagree that Julia should discourage users for now, but I agree that it’s way too soon for the general scientists to jump on the Julia wagon. It’s as of now just too hard to use. I love the idea of Julia and where I, IMHO, believe it’s going, but I’m also a realist and despite playing around in Julia daily I stay in C, Perl and R for production. I’m convinced Julia will have it’s day though since it addresses many of the silly problems we see in other languages like Python and R.

ScottPJones · June 15, 2018, 6:45pm

It seems to me that things break into two camps of users, those who are comfortable writing new code and debugging, who frequently end up being huge fans of Julia (like me), because it solves so many other problems that they’d been experiencing with other languages, and sometimes enables things that simply were not possible before (ask @ChrisRackauckas about this, he’s in biology also)

Those who are not, should be steered well clear of Julia, hopefully for no more than a year or so.

In my copious spare time, along with many others in this community, we try to solve those issues that are causing pain now, for ourselves, and whoever comes after.

felix · June 15, 2018, 8:45pm

You probably have quite a unique insight (or many people with similar experience are not articulating it in this way / here). I sympathise in general, but as someone with a different background who hasn’t experienced this (or has had different colleagues), it’s quite hard to concretely imagine the general things you are describing.

Could you e.g. do a side-by-side comparison of one common R workflow and the Julia equivalent (including code)?

That would help people identify how to polish things — even if it’s just one step at a time

ScottPJones · June 15, 2018, 9:45pm

Yes, we need more constructive criticism from people trying out Julia, those of us who have lived with Julia for a while may have forgotten or worked around the issues that are bothering other people, or people with different use cases.

Yifan_Liu · June 16, 2018, 1:01am

R dplyr is the best data wrangling package to my experience, not only because it is consice, efficient, but also consistent with base r style. My two cent is that if you have done data wrangling in MATLAB you will know it is just not a very pleasant experience, since Julia syntax is so similar to MATLAB, it is very difficult to make data wrangling in Julia as easy as in r and consistent with base Julia coding style at the same time.

Also, dplyr is easy to teach in terms of data science. Actually I really like Julia and I think it is really easy to understand if the user knows MATLAB.

Tamas_Papp · June 16, 2018, 6:18am

I am not sure about this. I think we need people who write code, or at least open issues about problems they have and are willing to participate in their solution to some extent.

Criticism, constructive or not, is rather cheap to produce, but per se produces little value. Also, the low-hanging fruits of language design have probably been picked already, so to give useful suggestions at this point one really has to be heavily invested in Julia.

The viability of a language usually depends on the contributors (in the broad sense). Catering to users who may potentially use Julia just for the warm fuzzy feeling of having a lot of users may be a misallocation of scarce resources.

austin-putz · June 17, 2018, 10:09pm

Sorry to those of you who get upset with me. I will admit I haven’t spent the time to really try to learn Julia hardcore, mostly in free time off and on. But I have tried to read the docs in depth and they are hard to understand most of the time. I finally found some stuff to help me even read function definitions. I hope someone eventually writes a full manual going from R to Julia for data scientists. Maybe this exists and I don’t know. I can only use Julia right now for matrix algebra. The other stuff I still find a bit hard to understand and learn. I’ll wait until the documentation gets a little better to come back to Julia, but for now I’ve mostly left. Eventually I’m sure it will get easier to use with better packages.

We (as data scientists) also need to wait until there are more statistics packages in Julia. Right now it seems a bit light and not that great. I think Bates has moved over to Julia though, which will help with mixed models.

One thing I find hard is recycling things. Like creating a table and then pulling out the second value in the table in Julia. Maybe this is easy and I haven’t looked into it.

@ScottPJones Yes, I think it might take a year or so to get it stable enough with enough documentation. However, some of us need it to finish a PhD in 6 months or so :). I agree, Julia has improved leaps and bounds since I just started a bit a view years ago. At that time still very unstable, but it has gotten a lot better. We (in biological sciences) appreciate all the work from developers. My point in commenting on discourse is to make sure computer scientists don’t write a language for computer scientists. If it’s going to be a scientific computing language at all, it should be geared towards as easy as possible. We just don’t have time to understand all the nuances of the language. But base matrix algebra I find quite easy. Many other things as well. It’s the things about it being a ‘static’ (ish) language that I don’t understand. One thing is the lack of simple as._____ functions like in R. I think this is because of the static ish nature of julia I can’t just convert anything to anything easily. There are different syntaxes and functions I don’t understand. One guy actually emailed them when I asked, but didn’t find it on the docs yet. Hopefully these things will be explained better to simple R users like myself. I guess I’m trying to be constructive, but it doesn’t come off that way over this discourse. I just want to give my perspective from an R user. The people that came from C++ I know, find it to be very “easy”, where the R/bash user finds it very difficult. All I’m saying is to not forget about the R users if we want Julia to excel in Data Science (back to the original post). If it’s easy for C++ you can grab them, but the whole goal of Julia I thought was to be an in-between so R users would be a huge market to grab from. Right now it still seems too difficult for regular R users.

@felix I would do a side-by-side but I can’t figure out the dataframes packages. There are very simple examples I’m finding online, but nothing like the complexity I have with my real data. They are too simple for me to go from that to complex problems. One thing I’ve really been frustrated by is the dataframes package in Julia. Not really because it’s bad or poorly designed, it’s just because it lacks documentation in my opinion. And this is where most open source stuff falls on it’s face. The rest of us will need a Hadley Wickham to come along for Julia to explain what’s going on. I really struggle to even understand the base docs for Julia. Again, because we come from biology, not from programming and computer science. I’m sure you guys think we are probably stupid, but we only have so much time to devote to programming and learning another language. I’ve finally started to find some really good stuff (one presentation that made me understand some very basic things about julia not before discussed anywhere else). I should start making a list of things I find difficult. Many that commented are asking me for concrete things. I think it will take someone eventually writing an R to Julia book.

@Yifan_Liu Yes, cant’ beat dplyr although some on here think otherwise. I don’t know any MATLAB, maybe that’s why I struggle so much in Julia. I just take it one day at a time. But it takes me whole day sometimes to figure out the simplest little thing about Julia. So I give up a lot and start again next week.

@Tamas_Papp But the people who write code don’t always see why it’s unfriendly to the end user (such as R users). For instance, I think some people love data.table in R. But I’ve tried to learn a few times before and give up. It’s just complex and not straightforward. It is I’m sure for those that wrote it and thought about it. It is cheap to produce, but if people are complaining does that not suggest that maybe the people programming it could do something to either make it easier for the end user or write enough documentation so we understand. You do need contributors, but all of you developers will have wasted your time if it’s a let down for those who try it. We have at least 4 people here in my department that tried it and stopped just because it’s that much more difficult to master than R. Maybe that will change and we can get them back after a stable version and more documentation. Hard to write I know when that language changes so rapidly.

@DoktorMike I guess we’ll have to agree to disagree… I tell everyone that asks me to hold off another year or two (maybe more). Otherwise they will just get frustrated and leave. I’m not sure I’ll ever stop using R (we’ll see), but Julia can replace the low level languages we need to write fast enough software for genetic/bioinformatic research. Still not sure if it will replace C/Java/Fortran for the production code/software we have in quantitative genetics.

@pfitzseb Sorry, I’m not sure what to say. I haven’t used it in a while. Maybe Atom has gotten better. I just don’t like the way it looks or anything. I just find TexStudio/RStudio way better designed and easy to use. I’ve asked people in my department and they agree with me 100%. No one here that has tried Julia likes Atom. They have only had issues as well. Maybe this will get better. But I would suggest starting from scratch and trying to mimic RStudio exactly. I hate jupyter too. Not sure how people can stay in that environment all day. Good for teaching I guess, but not what I want for my coding day to day. I’ll try VSCode. I’m sure it will get better and better. I comment on here only to draw developer attention to those of us struggling to learn the “super easy” language of Julia. I’m finding some parts very easy (those you find online). When I want to do something very complex, like write genetics software, I fall on my face and can’t figure out how to do much. So I think it’s oversold as easy and when you get there it’s disappointing. I found python many folds easier to learn, but gave up because the package manager was a joke (didn’t seem to have one that worked from what I could tell). I really hope there are no more issues with packages in Julia or it will also fail. I undervalued the CRAN when I started in Julia and python. It’s maybe one of the best things about R. Have only had 1 or 2 issues ever with downloading packages in R. This is under-valued I think by developers who come from C/C++ maybe…

@pdeffebach Maybe I have just gotten used to the weirdness of R. I guess I’ve struggled to complete complex things with dataframes in Julia. There are simple examples on line I follow and then cannot figure out more complex ones, such as summing up the number of missing values for each animal and calculating a percentage based on the number of observations for each unit (animals for me). This is quite easy with dplyr in R. Can’t find the docs to do this quite yet in Julia, although admittedly I haven’t looked super hard. But I have looked into the docs for each of those packages and can’t figure out how to do much more complex things than they have listed. I’d just like to see many more types of data wrangling on the website.

pdeffebach · June 17, 2018, 10:43pm

You are correct that there need to be more data-wranging docs.

However in the interest of completeness, here is how you do this in DataFramesMeta

using DataFramesMeta, DataFrames
animal = rand(["dog", "cat", "cougar", "mouse"], 100)
size = rand([1.2, 2.5, 3.5, missing], 100)
df = DataFrame(animal = animal, size = size, shape = rand(100))
df2 = @linq df |>
       groupby(:animal) |>
       @transform(pmissing = sum(ismissing.(:size)) / length(:size))

If you are coming from R, its hard to imagine a better, more transparent syntax that more closely resembles dplyr. I find it very intuitive. Perhaps this syntax is hard to find, the docs for this are at the DataFramesMeta github page. But we should definitely have a push for more blogs focused around these features. I have been meaning to make a side-by-side stata, R, julia blog post for a while.

Yifan_Liu · June 17, 2018, 11:34pm

DataFramesMeta is the data wrangling tool I choose in Julia, but still not as elegant as dplyr.

Topic		Replies	Views
Designated Target Audience of Julia 1.0? Community	152	10242	July 25, 2018
Results regarding Julia from HackerRank developer skills report Community	26	3340	January 28, 2018
Julia among the most loved languages Community	31	3423	June 11, 2020
Losing Science Marketshare to Non-Python Languages Offtopic	30	3122	September 21, 2022
What can we do to make Julia grow fast? Community	114	13231	November 16, 2018

Julia losing popularity among Data Science users (KDnuggets Software Poll)

Related topics