What can we do to make Julia grow fast?

Tamas_Papp · October 3, 2017, 11:31am

I taught a course using Julia in 2017 February-March, and my experience was very different. I found Plots.jl to be eminently usable (if not perfect — nothing is ) and it has improved steadily ever since. There are always issues, but once I open one, I get advice or a solution very quickly.

Plotting is a hard problem. Everyone is aware that Julia should have a good plotting framework, but it is more of a matter of a lot of hard work than a simple decision to do it. The way you describe the current state of Julia’s plotting ecosystem doesn’t recognize the enormous amount (unpaid) work that went into it — it took a lot more than “stitching together” existing stuff.

mkborregaard · October 3, 2017, 11:38am

I’m going to print that quote on a t-shirt.

pkofod · October 3, 2017, 11:57am

What this guy said!

I actually think Optim/NLsolve falls in some of these places, but, as your point pretty much is, there’s no real reason to. I’ll make sure to test arbitrary precision support. We’ve been using a sort of iterative setup in Optim that allows you to pass callbacks for quite some time now, but maybe we should use the actual iterator interface - somebody’s got to do it though Complex number support is coming very soon.

ChrisRackauckas · October 3, 2017, 12:00pm

Yeah. My point is Julia is good at handling generics like this, so there’s no real reason to not support these things. We should make them ubiquitous and put applications of them on display. Performance is one small way to get people to come, but that’s still only a niche group of people who actually care. You need to look to other niches, and the best way to do that is to just have features which you can’t find other places. Complex numbers, arbitrary precision, and the ability to inject essentially arbitrary code for full control is definitely something we can take full advantage of, and then point to as a reason to use Julia’s packages.

Niall · October 3, 2017, 12:28pm

You are absolutely correct. That description was disrespectful and I apologise to everyone. It arose more from the fact that I’m currently very short of time than from any animosity - sorry!

The fact of the matter is, though, that in order to do any normal everyday graphics work you need more than one package - each has its own strengths. So you really need to use Plots, which links them (very nicely) together. However the result is that you end up installing a whole bunch of useful backends plus Python. Now, on my machine that’s not a problem, but for students working on university computers it’s more complicated, since they work with distributed images of the original installation, and these became very unmanageable.

First, the images were quite unwieldy, making them difficult to update on all machines as we slowly found problems with the installation. Second, there were several aspects that we never got working properly. Unfortunately I can’t remember offhand exactly what issues these were, but I know one was that the only way we could draw graphics was via a browser.

I hope that helps somewhat. Sorry again for my rude explosion.

piever · October 3, 2017, 12:41pm

Actually I believe a good way to make progress in that direction is to have at least one backend (which should be default) that is good enough at more or less everything and easy to install. My understanding is that this is happening little by little with the GR backend which:

doesn’t require python, you just download the binaries
is very fast
can save nice vectorial graphics

But I still think it needs

a little more polish (especially when making plots for a paper)
nice interactive display (which should be coming soonish, there was already a preview in the JuliaCon videos)

It’s really important to emphasize that a good plotting package really takes a lot of work as the number of possible features/bugs is just too big.

Personally, I think that the experience of somebody who, like you or @Tamas_Papp, takes the gamble and teaches a course in Julia is extremely valuable because you end up:

figuring out what are the current limitations from a student’s perspective
probably doing some PRs to fix the most urgent problems

Tamas_Papp · October 3, 2017, 3:51pm

I did not consider it a gamble, rather an opportunity to teach something that is valuable to the students. It is my impression that many of the students kept on using Julia after the course, despite, of course, constantly grumbling about plots and the constant need to restart the kernel (this was before #265 was fixed). But this situation may be special to economics, where some prominent people are promoting Julia.

Actually, Plots.jl + Interact.jl allowed us to program some “experiments” about numerical methods, allowing students to explore various options visually. I think they found that pretty impressive (I know I did, it was very easy to set up). So even half a year ago I would not have complained about plots, it was an effective teaching tool. Back then we had to restart all the time because of #265, which, combined with the slow loading, was really painful (insert favourite Four Yorkshiremen reference here).

The single reason that makes me hesitate to push Julia universally at the moment is not plots, but the “dataframes” ecosystem. My impressions is that something very nice will eventually evolve, once named tuples are in and the interface stabilizes, but in the meantime large changes should be expected, and anything specific I would teach today would be obsolete very quickly.

Per · October 4, 2017, 6:14am

I think a very good way to popularize a language is to provide lots of small examples that people find with Google. This is the best form of search engine optimization: People who searched for something seemingly unrelated to Julia will land on a page that solves their problem and introduces them to Julia as a side effect.

Let’s say you want to do a task X. You google it, but don’t find anything useful. So you write 20 lines of Julia code and the task is done. Now, if you would spend an hour putting your code in a small package, adding a couple of tests and a bit of documentation (making sure that the search terms that you googled are prominently included in README.md) and uploading it to github, then everyone who googles those terms in the future will find Julia.

This is something that anyone can do. You don’t have to be involved in a large project or be familiar with lot’s of pre-existing code. Julia makes it particularly easy to create a new project and upload it to github.

As a bonus: The person who is most likely to benefit from the extra work that you put in is you. Five years from now, when faced with the exact same task, you will google it and find our own code and documentation.

kristoffer.carlsson · October 4, 2017, 1:01pm

I would argue packages are not the correct place to put 20 line snippets in.
In my opinion, asking a stackoverflow question and answering it yourself might be a better strategy.

jonathanBieler · October 4, 2017, 1:13pm

Another option is adding examples in the documentation (like in Matlab docs Cumulative product - MATLAB cumprod - MathWorks Switzerland).

mkborregaard · October 4, 2017, 2:36pm

What do you propose as a guideline for making a package?

Per · October 4, 2017, 2:36pm

I would argue that the number of lines is irrelevant. The relevant question is whether those lines of code do something useful. (Also, by the time docstrings, error messages and tests are written, the original twenty lines will have turned into somewhere between fifty and a hundred lines of code.)

There’s nothing fundamentally “expensive” about writing a package. There’s no finite pool of package names, or limited space in github. Nor does it require more work than copy-pasting the code into stackoverflow (assuming that you do an equally good job documenting the code there as you would have in a package.)

When using somebody else’s code, I much prefer copy-pasting an url into Pkg.clone(...) compared to copy-pasting code into a new file. That way I automatically have a reference to the original source, and I can easily get updates if the code is improved upon.

kristoffer.carlsson · October 4, 2017, 2:48pm

So we are on the same page, what I am thinking about are questions like:

How do I open a file, extract all the words that start with X and then resave the file without those words

It is unlikely that you would google how to do this, find the package ExtractWordsThatDontStartWithXAndResaveTheFile.jl, add that as a dependency to your package, perhaps create a Pull Request to improve the package, wait for that to get merged, ask the author to tag it in METADATA , be careful with adding upper bounds to your package so it doesn’t accidentally break etc etc.

On stackoverflow (which is literally made for stuff like this), you could get to the source in one click from google, see how many upvotes it has and if it has any comments from other users who have tried it and use it straight away.

Per · October 4, 2017, 3:26pm

I’d probably name that package StreamingEditor.jl and it’d provide a modify_file function that takes a file name and a function, and applies that function to every line in the named file.

The “extract all the words that start with X”-part is only half a line of code, so I would probably not put it in the package itself, but maybe in the documentation as an example. The package might later expand to do a lot of the things that sed does (if it turns out I need those things.)

Eventually there might be a bunch of questions on stackoverflow with two-line answers, where the first line is using StreamingEditor, and in my opinion this will be much more useful than 20-line answers.

(You can use code from a MIT-licenced package without interacting with the original author in any way. Just make your own clone - it doesn’t have to be public - and modify to your heart’s content. You can merge in improvements later if you like, and you are under no obligation to give any of your own improvements back.)

jmgnve · October 4, 2017, 3:32pm

Would it help people to learn the language quicker if a few up-to-date tutorials about important topics where directly hosted on the julialang homepage, complementary to the docs? In particular tutorials (1) explaining how to actually use powerful julia concepts by a bunch of simple examples, (2) explains how to avoid/handle common pitfalls that are julia specific, (3) tutorials showing central packages…

I think that would be a good glue between the quite simple first part of the docs, and the rather technical second part. Would that help people (without a solid computer background like me) coming over the intermediate step easier (where I am stuck right now because of whatever reason…)? I would gladly try to contribute if there is a natural place where people put such educational stuff. The docs, blogposts and stackoverflow are clearly useful. However, tutorials are currently not maintained… perhaps mostly because Julia is rapidly changing right now… but would it help to host some tutorials on more “official” place?

chakravala · October 4, 2017, 3:35pm

I’d say that a package should only be made if you actually have a coherent set of methods that make sense as a package with documentation and an API.

If it’s just a snippet of code, either just make it a plain repository or a Gist or post the code on Discourse on Stack Overflow. It doesn’t have to be a package if you’re hosting it on github, you can have a repo with your examples and explanations specific to your topic or use Jupyter notebooks for that. The only reason you’d make it into an actual package is if you have some sort of coherent API that actually makes sense to use as a package. If it’s just an example of how you do something, then it’s not really a package but just simply a .jl script file.

Also, you do not need Pkg.clone to pull a repository, you can just simply use git clone

Per · October 4, 2017, 3:40pm

My guess (and I’m only speculating) is that the core developers are pretty busy working on 0.7 at the moment, and are planing to make up-to-date tutorials after it has been released. Those tutorials would then also be valid for 1.0.

Per · October 4, 2017, 4:22pm

I agree completely, which is why I said that you’ll have to invest another hour of work in the API after writing the 20 lines of code.

But 20 lines of code and nothing else is not a good answer on stackoverflow either. You have to provide an “API” either way. On stackoverflow the “API” might consist of instructions like “replace ‘filename’ by the name of your file in the below code snippet”, whereas in a package there would be a function that takes filename as an argument.

(A coherent set of methods can consist of only one method.)

True, but it’s also no more work creating a package than any other type of repo. In fact, I can’t think of a way of creating an empty repo and adding the MIT licence to it that requires less work than PkgDev.generate("MyPackage", "MIT").

The question is then: What is more useful? I’d argue that if you actually want to run the code, then a package is strictly more useful than snippets of code, because you can use it with the package manager in addition to all the things that you could do with a random snippet of code (such as copy-pasting.)

For code that I don’t want to use, but only read and learn from, I do appreciate Jypyter notebooks.

How is git clone is any simpler than Pkg.clone ?

chakravala · October 4, 2017, 4:29pm

My post is more about providing some guiding principles, I’m not trying to tell you what to do. However, if your code consists of only one method, then whether it should be its own package or not should be thought more carefully, since it might just be a function that could also be added to another package, where it makes more sense. It just depends on what it is.

When you initialize a repo on github, the first thing it asks you is whether you want to add a license, so by just creating a repo you already automatically have the license, so that is just as little work as PkgDev

If you are using julia in the terminal, then it is just as fast to use git clone as it is to use Pkg.clone, since the shell is immediately available via the ; key. For terminal users, both are just as fast.

EDIT:

and git clone actually has the potential to be faster, since it bypasses the julia package manager

lobingera · October 4, 2017, 4:34pm

a) have you ever wondered, why a simple Pkg.status() can take more than a minute?
b) " Sorry, we had to truncate this directory to 1,000 files. 655 entries were omitted from the list. "

and the there-is-no-cost-of-small-packages created also the
c) How one developer just broke Node, Babel and thousands of projects in 11 lines of JavaScript • The Register

There is actually a cost of fragmentation and it’s already visible in julia. I’d not recommend to go further into this direction

Topic		Replies	Views
The psychological reason behind the Julia community's lack of success (not just adoption) and why I'm afraid our future is not so bright Community	11	581	December 2, 2024
State of machine learning in Julia Machine Learning	60	65541	August 26, 2022
On Machine Learning and Programming Languages Machine Learning	48	8663	January 25, 2018
What do you work on? Why is it important? Community	29	4299	June 7, 2021
Future directions of Julia Community	149	13095	December 18, 2020

What can we do to make Julia grow fast?

Related topics