I’m hoping to work on Julia documentation this year, so feedback like this is useful. Docstrings are much easier to work on than the manual since there’s not much additional context and textual flow to consider, but your post makes me realize that the manual is more likely to be seen by beginners and might actually have more impact, at least in the initial stages.
By chance, I came across JuliaNotes today. It has concise, practical answers for common questions that people beginning to work with Julia have. It’s by no means comprehensive, but seems like a very handy, easy to use reference for beginners.
That would be a bit of cheating since I already knew about the Optim.jl package, and from there this is easy to find. However, googling for ‘minimum of unconstrained multivariable function julia’ does return the relevant page in Optim’s documentation (though, for some reason, an older version) as the first link.
How do you feel the Optim docs page compares to the MATLAB one? One thing I note is that MATLAB makes better use of subsections, with many in-page links making going back and forth between parts of the page easier. Also, I really like the “Topics” section at the end, which links to both conceptually related pages and the language features people might not be familiar with (Anonymous functions in this case).
I think we’ve all come across packages that are hard to make sense of, so no worries, your point comes across.
I’m glad you brought up MATLAB because one thing I really like about its documentation is that it’s always practically oriented and immediately usable. You could be in the middle of coding, look up a function, and quickly be comfortable using it in your code.
In Julia, you never know when you’re gonna need to dive into a rabbit hole because you wanted to look up one function. To be fair, things are inherently more complicated in some ways because of the type hierarchy, dispatch, etc., but there’s definitely room for improvement in this direction.
Another advantage MATLAB has is that the documentation is centralized in one place, making it easier to find reliably. With Julia, you have to rely on search engines finding the right place. The JuliaHub docs search helps a lot, but still sometimes when you don’t know the exact phrase to search for (and sometimes even when you do), things can be hard to find. I imagine it’s even more so if you are a beginner.
I don’t know what the solution for this one is. A custom search engine that uses a mainstream engine underneath, perhaps - something like Brave Goggles with an index of Julia doc pages.
Well, I have less insight than you do. I’ve coded in just about every language under the sun, and c is my favorite, its simple and one know what one gets. But I do science not coding.
Here is the documentation example I should have suggested to you in the first place. Try googling “Julia simulate student-t random numbers”, You end up here Random Numbers · The Julia Language and how do you figure out how to get Student-t RNs?
The only redeeming feature of Matlab is documentation. As you suggest, that is the benefit of being produced by a well funded commercial entity that uses documentation to sell it. I had a Matlab sales person call on us and that was central to the pitch. Julia, (and R and Python) will never get the same quality documentation. Even if the money was there, too many strong willed people will stop it.
I really hoped to solve the 2 language problem with Julia, my team uses her for 2 large projects and we ended up with a 3 language problem: Python+Julia+R. Python dominates data pipelines and R all the specialized libraries (like dynlm) and plotting. We now run Julia from python. Still no regrets.
What frustrates me is that some parts of Julia are so fantastic, and then we run into a buggy and poorly documented library. We spent hours on Pkg and in the end gave up because we could not make sense of it, and now only use modules. Packages are very easy in R and Python and I am sure they are very easy in Julia. We just could not figure it out.
So I don’t know the answer to your question either. Is is possible to ask Julia package writers to always include sample code like R does? And even better, always have R style vignettes?
Your (implied) suggestion of a central entry point to documentation is very good.
However, things are improving rapidly, the Julia developers are fantastic, the community is inclusive and pleasant. I am looking forward to the day when the 3 language problem → Julia.
Huh, I googled “julia student-t” and got to Univariate Distributions which is where Distributions.TDist is documented.
It’s true that this is not in the Julia manual itself, but that’s inherent to not having a single monolithic library, so I’m not sure what you are asking for here.
I think part of the issue here is that in an extremely well-established language like Python, there are large organizations with deep pockets that pour resources into a few high-profile packages in key problem areas. […] it is easier to identify a well-resourced solution.
That being said, there’s no question that more documentation could be written, and better documentation, and I agree that having more examples in the documentation is desirable.
You put your finger on the problem. I was looking for student-t random numbers. You found the distribution page not the random number page.
After spending some time, one figures out that one needs to use both the distribution page and the random number page together. Google only takes you to one. So you end up with 1/2 of the solution.
these pages don’t link to each other even if both are needed for student-t random numbers. (and most other distributions)
It would be so much easier if a) these 2 pages linked to each other and b) there was some language to effect of “this is how you sample a student-t”.
In R it is rt(). found immediately. And if you need quantiles its qt, dt for density.
p.s. The Julia t RNG is much faster than the R t RNG. I don’t know If I should be worried or happy
I agree with you, I am one that understands things by examples, and for a new user of Julia, in particular, the implicit knowledge of multiple dispatch that those pages assume may be a high barrier. It is not absolutely obvious from those pages that this is the way to obtain a sample of any of those distributions. It should just be written everywhere:
I really think that it’s worthwhile if you’re considering Julia to do some work on “getting” the basic Julia architecture… Multiple dispatch is really really important for Julia, and creating structs that represent a “thing” that you use to identify which operation you want done by a function is super common.
If you get this basic idea, then
rand(TDist(2),3)
is a pretty obvious thing to do, it means “generate random numbers with the distribution TDist(3) and give me a vector of 3 of them” … this idiom is more or less the same kind of thing as
rt, rgamma, rnorm, rbeta, rexp
in R. R uses the name of the distribution in the function name and prefixes with r to get random numbers, or q to get quantiles, or p to get probability points, or d to get density…
And to get different distributions you simply hand it a different distribution object…
This is such a central idea in Julia, that before you can really criticize Julia you should really have learned this basic idea. On the other hand, I think it’s entirely right to criticize tutorials and introductory material if it doesn’t hammer home this idea very early on in the examples/learning process.
Yes, it is an order of magnitude better - but I agree that it can be difficult to navigate. In part because the docs explain the details, but not user workflows. The Manifest.toml is a true snapshot of everything and given Julia’s superb installation of binaries it is very dependable relative to python (even with virtual environments I find it very difficult to have good reproducibility).
Some notes are here: 2. Introductory Examples — Quantitative Economics with Julia which show some of the basics. A good heuristic is that unless you are writing a package which will be used by downstream code, you pretty much always want both a Project.toml and a Manifest.toml in your github repo, and you should always have one associated with every project. For beginners, keeping almost nothing in your (1.7) environment outside of development tools will make things much easier and reproducible.
Not surprising. Despite the hard work of many, Julia has a long way to go for things like manipulating data - but also why use Julia for those purposes? R is tough to beat for the things it is best at, as is Python for things like cookie-cutter deep learning. I even think Stata is the right choice for many situations (especially given that it has a half-decent online package management system and network effects for applied microeconomists).
In general I like your suggestions in the writeup about some languages, including Julia, being the best language in certain circumstances, but not sure the criteria to conclude that R is the best overall language. That is a statement on the sorts of applications you are using moreso than the language itself. For almost every use case I have, R would be a non-starter, and the choice is matlab, python, or fortran.
Agree that there are a lot of use cases where R is a non-starter. That said, we have a project where estimation takes hours on a multi-core machine and R is as fast as Julia and slightly slower than c.
But the real benefit of R is a) that it has so many high quality libraries and b) it’s much easier to use (via Rstdio and good documentation) than the alternatives.
Well no, I have only filed issues when I find bugs. And it is not obvious to me how to frame this issue. (is is an issue with distributions, random numbers or julia docs in general?)
But yes, ranting on discourse about documentation is not optimal either.
And if you are unsure in which package to file an issue do it in one of the packages, and if the package authors complain do it in the other one. Or just in both. Not much work.
Note that users don’t get to the language because they goal is to learn a language, nor because they are studying computer science concept in abstract.
The language is fine and the concept is powerful, it is the user guides that should just provide examples. Explaining in abstract what multiple dispatch is doesn’t help at all before one (I’m talking about me) has seen a few examples.