Given that it was only 1.2% to begin with I suspect the result is prone to large statistical fluctuations. Itās also a much younger language than most of the others listed here, especially Python which seems to be increasing itās dominance.
I can tell you from experience that Julia is pretty unpopular in data science, but I suspect thereās still plenty of opportunity to change that.
To put things into perspective, there have only been 14 votes for āI have used Julia in the last year for a real projectā.
IMO this is, at least partly, classic clickbait: make a non-representative small sample poll, add some text on how some languages/toolkits are ālosingā and others are āwinningā, and it will be shared and discussed widely by both categories.
ālosing popularityā implies that Julia previously enjoyed popularity among data science users, but I donāt think it has ever occupied more than a small niche in data science.
For the majority of data scientists, I think this has made sense ā the language has not reached stability. That will be changing soon, making Julia attractive to a broader community.
Clickbait? Iāve been following discussions in this discourse
pool. This is the kind of reply that makes this community look unfriendly to me.
Generating weakly implemented polls about popularity of various things is a common technique to generate traffic for sites. I am sorry if I offended you, but I donāt think that pointing this out can be construed as unfriendly. Also, I donāt think you are responsible for or affiliated to this site so my comment was not about what you did.
Without considering the fact that this could be something without any statistical implications, I have to point some facts:
- Python design dates back to 1980s, hence it is way older than Julia that started in 2012.
- I have been using Julia since v0.2-beta (or something like that). From my humble point of view, Julia language really got mature for starting small projects in v0.4 (why? because it was when I stopped compiling
master
to use in my projects :D). AFAIK, v0.4 was launched in 2015. - In Brazil, we are starting to see quite good adoption given the language age. My wife was in a workshop where there was a special session called: āTutorial: Julia language for Geophysics scientists.ā Yesterday, I saw a full programming course (scientific programing) in another Brazilian university that was entirely prepared in Julia.
- We are starting to have toolboxes / packages that are unique to Julia language. For example, I have been using differential equation solvers for my entire academic life. I have not seen yet an open-source package that can do things as easily as we have in DifferentialEquations.jl.
Hence, given all those facts, it is amazing that Julia is even mentioned that early. We are talking about a new language that has less than 3 years of production-ready stage (again, from my point of view). Conclusion: the future is bright for Julia, believe me
EDIT:
- My students of the discipline Rigid Body Dynamics will receive 1 extra point if the final project is written in Julia. Hence, we have 7 new Julians
- I will do my part and I will offer this year a Julia course in my institution.
I think the data this year might have been skewed by RapidMiner actually campaigning for votes.
Next time they do this, maybe a āget out the voteā campaign on Discourse, Slack and Gitter would skew it in our favor.
Also, when Iāve been at #ODSC (Open Data Science Conferences), I talked to many people who were interested in learning about Julia, as soon as it was released (so, I expect a lot of people in August to start taking a new look at Julia, I just hope that things are in a good state by then for people looking seriously at Julia for the first time [thatās not a criticism, itās just a matter of Pkg3 getting stable and everybody pitching in over the next two months and making sure all the packages that are still being maintained are updated for v1.0, and maybe getting a triaged list of packages that are well-tested on v1.0)
Can you expand on that, please?
What reasons have people given? Are things things that can be addressed easily (or are in the process of being addressed already)?
If I were to hazard a guess, Iād say 1) database access 2) better handling of non-UTF-8 data from databases and CSV files 3) better parallel programming support (Iām looking forward to the PARTR stuff, after seeing the talk at the C.A.J.U.N. Meetup, see the video of it at: https://www.youtube.com/watch?v=YdiZa0Y3F3c)
The equilibrium of these games is everyone focusing effort on campaigning, and these efforts more or less cancelling out, with a lot of effort spent on the whole thing as deadweight loss. I would be sad to see āplease click here to inflate our votesā messages on forums I visit.
Are there results actually important for anything? Besides generating visitors to sites.
Yeah, I second that. Obsessing over these things is not going to help anyone.
Well, hereās my two cents on this. āData scienceā is a recently made up term. As far as I can tell, the vast majority of us working in the field arenāt actually specialized in anything related to our jobs (although certainly some are, e.g. NLP people), though we may be very highly specialized in terms of our educational backgrounds. In my experience this has led to wildly divergent opinions on even practical matters such as tooling.
At the risk of over-generalizing, I think there is a prevailing attitude in data science that anything that Python isnāt āgood enoughā for is not worth doing, you should mostly write scripts and not worry too much about re-using code. If thatās your view, I can see why Python seems like the ultimate tool: it has a staggering amount of pre-existing code available for it, and itās a very fine scripting tool. I have to admit that, because of the way I learned programming and computing, those attitudes are extremely frustrating to me, but I also have to admit that they are perfectly valid in the majority of data science roles, and that people with this attitude are getting a huge amount of valuable work done without caring one bit about what I think about how theyāre doing it.
As data scientists we are also frequently asked to work on data that is presented to us in some truly nightmarish formats, and we are constantly having to deal with awful things like csvās and SQL (awful mostly because itās actually about 100 different things masquerading as 1 thing). Therefore, there is (very rightfully) also a huge emphasis on tools for doing things like querying databases, and I suspect that probably a majority of data scientists when approaching Julia would be primarily interested in situations like what @Liso described above, where it was very correctly pointed out that weāve sort of abandoned the idea of having a universal database API for Julia. This is a real issue. Database support is hugely important (especially if youāre a data scientist but also if youāre not) and it really kind of sucks in Julia right now. We might have some great support for some specific databases, and theyāve done a great job on ODBC.jl, but frankly thereās nothing as clean, simple and easy as sqlalchemy in Python, which must have taken a monumental amount of work to get it into its current state. If you are new to Julia and donāt know where to look for things like ODBC.jl, LibPQ.jl, JDBC.jl or MySQL.jl, things look much worse than they really are. There are also some people who would come in, perhaps learn about the packages Iād mentioned, but only see that there is no sqlalchemy equivalent and immediately dismiss the whole language.
My counter to all that is basically that it gets the priorities completely backwards thanks to the existence of things like PyCall and JavaCall. Yes, I need database support, but pulling data from a database isnāt really that complicated. If I have to do it through PyCall, and itās a little slow, I donāt really care. You know what sometimes is that complicated? MILPās with millions of variables, stochastic constrained optimization problems, POMDPās, solving stochastic differential equations. When I first started using Julia, I had recently gotten really aggravated with the misery of trying to do large MILPās in Python. Everybody around me was using PuLP. It was really, really slow and even uglier than it was slow. Then I wrote up a problem in JuMP. It was this tiny little thing that fit on one screen, and it looked almost exactly like it did when I wrote it out algebraically in LaTeX. It took 0 effort to convince the guy I work with who was working on these things for years longer than I have that we need to move everything over to JuMP. Now Iām getting ready to try a more general version of those problems that will require a stochastic method like simulated annealing (possibly using Hamiltonian updates like in @Tamas_Pappās package?) and this would have been impossible in Python, because the updates are going to require a huge amount of custom code. In fact, the engineering group at my company attempted something like this once in Python using canned tools and, after being happy with a toy problem, wound up completely abandoning it because they couldnāt get it to work on real problems. Why? It was too slow, and too hard to modify. If I canāt get it to work on Julia, I can be confident that my tools arenāt the problem.
Lastly, as Iāve already talked about extensively elsewhere, some of the simple stuff in Julia is really so nice and it can be so hard to convince people of that if itās built into the core of their being that they should never write a million iteration loop. I donāt want to do everything with some complicated API, let me just write simple code using Base
. Canāt think of a way to do something thatās not all database operations? Fine! Just write some code, use a Dict
, use a Vector
, create your own custom struct and put it in a million element array, just do whatever you want, thatās how writing code works. Iāve recently had a data manipulation task that started out very simple and lo and behold, it turned out I had to do a whole bunch of stuff with quadratic time complexity that would have taken an hour in Python, or I would have had to go hunting for the right package to do it (if that were even possible). In Julia it was all really simple stuff. I use mostly Base
for the vast majority of what I do, just like in Python you in principle could use the stdlib to do most of what you do, but you donāt because itās too slow. I lose numpy the second I want to put a Python object into it, Pythonās an OO language, itās supposed to be all about writing objects, so then what good is numpy? (well, lots of good, just not for Python objects)
Nope, itās never been an issue (perhaps surprisingly). So yay for that
The reason Iām so passionate about it that I wrote this huge rant on Sunday afternoon, and why I am āscaredā of Python is that in a way Julia and I are in very similar situations in our career in data science. We grew up doing physics and we want to do something interesting enough that itās more than just feature selection and canned solutions, and if we canāt find that there is no reason for us to be there. When that inevitably occurs, we will have to look elsewhere for jobs. I may not find one, but fortunately, Julia has several already.
3 posts were split to a new topic: Re: moderator action on āJulia losing popularity ā¦ (KDnuggets Software Poll)ā
That website looks like crap.
Maybe it should be losing popularityā¦
Youāre lucky, Iāve seen issues on GitHub, posts on StackOverflow, questions on Gitter, etc. where people have run into such problems, fairly frequently. Often the people donāt even realize thatās the problem, they just think the data is corrupted.
Maybe it does:
2018 answered 2052 participants
2017 about 2,900 voters
2016 - 2,895 voters
But that doesnāt means we could be satisfied!
Iāve replied there now. I do not generally browse Stack Overflow, but I do always respond to tags here on Discourse.
There was not very much interest in maintaining a generic database interface. For my work, it made much more sense to pick an interface (DataStreams.jl) which already supports many types of data output.
If anyone wished, it would be easy to create a database interface and support PostgreSQL using LibPQ.jl. Itās intentionally very easy for someone to do so. But unless it is useful for my work, I cannot devote time to building it from the ground up.
Itās also worth noting that the database interfaces which had been written were nothing more than sets of methods for connection, query, fetch, etc. One still needed to write custom connection and SQL for each database.
DBAPI is more than set of methods because it has to be generally accepted API.
I am afraid that version of DBAPI which we have now is obsolete because Nullable
as I understand is supported in different way now.
So it needs more work. I understand that it is not your personal goal.
Your position is fully understandable. There are more people in same situation. We need to work with data. We could glue some C/C++ library, but working on API normalization is too much.
Ah yes, what I meant was that it was an interface for packages to implement, but wasnāt a framework itself like SQLAlchemy is. It was more like Pythonās DBAPI 2.0 + a shared namespace.
I agree there is value to having a uniform database API. It just makes things a lot easier for the person who is using it and has to interact with a hundred different types of SQL databases. My solution for the time being is JDBC.jl, though Iām hoping that over time there will be less and less need for me to use anything other than Potgres and Iāll just use LibPQ.
Itās definitely defunct, the maintainers told me so when I overhauled JDBC.jl (which no longer uses it). I might be willing to undertake writing a uniform interface (which I think is really the most valuable aspect of sqlalchemy), but Iāll only attempt it if I know for sure that everyone is on board (i.e. at minimum ODBC, LibPQ, MySQL, SQLite, I can handle JDBC).