Not precisely where this discussion should sit, but this seems a good place to start.
Kaggle.com is a neat site for data science, hosting competitions in data analysis and machine learning, as well as instructional material and learning resources. Users can also post code and descriptive text in jupyter notebooks and run them as “kernels” on Kaggle’s servers.
The site hosts predominantly python and R code, but in principle it supports julia code. That said, it’s clear this isn’t actively supported, since the jupyter notebooks don’t give a julia option, and even scripts that run their example code fail out with an error when using DataFrames (see here for kaggle discussion).
I’m wondering if it would be worth it for someone at julia computing or someone with more knowledge than me to reach out to Kaggle and offer some support - they don’t currently have the user demand to work hard on their end, but it seems like it could be a useful avenue to increase interest among users in using julia for data science.
What can the Julia community do to support you in making Julia available as a language for playing with your amazing data sets in your wonderful competitions?
Julia is a young language taking aim at a lot of the strengths of python, matlab and R, with a focus on fast, simple technical computing. It supports notebooks, too!
That’s great, thanks! Someone (looks like from jupyter) also responded in that Kaggle thread suggesting that they file an issue about their issues with notebooks, so they’re getting hit from multiple fronts
Thanks for your feedback. I have passed this on to our engineering team to look into. Also, you may be interested in posting this on our feedback forum, as others may like to chime in as well.
For what it’s worth, there is a “Getting Started” competition titled “First Steps With Julia”. However, as I recall, even when I went through it a year or so ago (around the time of the Julia v0.4), their tutorials weren’t exactly “canonical” Julia. If someone wanted to provide an updated version of this tutorial, that might be a good place to start.
I think some work would need to be done to make the user experience as smooth as the Python version and not spend so much time on compilation, like using PackageCompiler(X) to bake the standard packages into the sysimg. But what’s standard, and how would it handle using different package versions? There might be other reasons I’m missing. The Kaggle team member did post “Supporting multiple languages adds a lot of work for us,” in that thread.
I’d wait until Julia has a major ML interface. MLJ looks promising, as soon as MLJ incorporates Knet, Flux, and options for hyper-parameter tuning I expect a lot more widespread use of Julia fro ML. Kaggle support will naturally follow.
I think that would be cool, but not sure it’s something julia computing is in a position to spend money on. Might be cheaper to offer a year of free support or something to the kaggle Dev ops team or something…
Or maybe a company using julia (invenia, looking at you!) could sponsor a competition?
It would be good if we can get as many people as possible to chime in on the thread here:
I met the Kaggle CEO - Anthony Goldbloom - last year and he reiterated what the Kaggle staff said in that thread. They did not see sufficient Julia usage when they had support. After their acquisition by Google, they had to redo a lot of their infrastructure and ended up dropping Julia. When the time is right, and there is significant community demand, he said they would certainly revisit it.
A lot has changed in the last 2 years and the Julia community is at least 4-5x larger now.
From what I know, there are quite a few folks there who champion Julia. So it would greatly help for folks here to chime in there. Ideally with a somewhat detailed comment on why you would like to see Julia on Kaggle. Certainly upvote the topic and other comments you agree with at the very least.
The 2021 Kaggle Machine Learning and Data Science survey will close on Monday. It would be great for Julia users to mention their use of Julia in that survey. The latest Stack Overflow survey had the size of the Julia community about a fourth of that of R. If Kaggle sees a trend of the size of the Julia community approaching that of R, it might be a good argument towards including it in the future.