Choosing a numerical programming language for economic research: Julia,

I’m hoping to work on Julia documentation this year, so feedback like this is useful. Docstrings are much easier to work on than the manual since there’s not much additional context and textual flow to consider, but your post makes me realize that the manual is more likely to be seen by beginners and might actually have more impact, at least in the initial stages.

By chance, I came across JuliaNotes today. It has concise, practical answers for common questions that people beginning to work with Julia have. It’s by no means comprehensive, but seems like a very handy, easy to use reference for beginners.

4 Likes

Its really great you are planing to work on Julia documentation.

Here is an example that to my mind is the right way to write function documentation (and should be the first link when searching for it)

then for extra points, try to figure out how to do the same in Julia.

That would be a bit of cheating since I already knew about the Optim.jl package, and from there this is easy to find. However, googling for ‘minimum of unconstrained multivariable function julia’ does return the relevant page in Optim’s documentation (though, for some reason, an older version) as the first link.

How do you feel the Optim docs page compares to the MATLAB one? One thing I note is that MATLAB makes better use of subsections, with many in-page links making going back and forth between parts of the page easier. Also, I really like the “Topics” section at the end, which links to both conceptually related pages and the language features people might not be familiar with (Anonymous functions in this case).

1 Like

Mea culpa. I should have picked a better example. I like that page and had not seen it. Apologies to the community.

Google delivered me here, Packages

And I actually used GitHub - JuliaOpt/NLopt.jl: Package to call the NLopt nonlinear-optimization library from the Julia language last time I needed optimisation.

I should have used pkg as example of documentation I cannot understand.

I think we’ve all come across packages that are hard to make sense of, so no worries, your point comes across.

I’m glad you brought up MATLAB because one thing I really like about its documentation is that it’s always practically oriented and immediately usable. You could be in the middle of coding, look up a function, and quickly be comfortable using it in your code.

In Julia, you never know when you’re gonna need to dive into a rabbit hole because you wanted to look up one function. To be fair, things are inherently more complicated in some ways because of the type hierarchy, dispatch, etc., but there’s definitely room for improvement in this direction.


Another advantage MATLAB has is that the documentation is centralized in one place, making it easier to find reliably. With Julia, you have to rely on search engines finding the right place. The JuliaHub docs search helps a lot, but still sometimes when you don’t know the exact phrase to search for (and sometimes even when you do), things can be hard to find. I imagine it’s even more so if you are a beginner.

I don’t know what the solution for this one is. A custom search engine that uses a mainstream engine underneath, perhaps - something like Brave Goggles with an index of Julia doc pages.

5 Likes

Well, I have less insight than you do. I’ve coded in just about every language under the sun, and c is my favorite, its simple and one know what one gets. But I do science not coding.

Here is the documentation example I should have suggested to you in the first place. Try googling “Julia simulate student-t random numbers”, You end up here Random Numbers · The Julia Language and how do you figure out how to get Student-t RNs?

The only redeeming feature of Matlab is documentation. As you suggest, that is the benefit of being produced by a well funded commercial entity that uses documentation to sell it. I had a Matlab sales person call on us and that was central to the pitch. Julia, (and R and Python) will never get the same quality documentation. Even if the money was there, too many strong willed people will stop it.

I really hoped to solve the 2 language problem with Julia, my team uses her for 2 large projects and we ended up with a 3 language problem: Python+Julia+R. Python dominates data pipelines and R all the specialized libraries (like dynlm) and plotting. We now run Julia from python. Still no regrets.

What frustrates me is that some parts of Julia are so fantastic, and then we run into a buggy and poorly documented library. We spent hours on Pkg and in the end gave up because we could not make sense of it, and now only use modules. Packages are very easy in R and Python and I am sure they are very easy in Julia. We just could not figure it out.

So I don’t know the answer to your question either. Is is possible to ask Julia package writers to always include sample code like R does? And even better, always have R style vignettes?

Your (implied) suggestion of a central entry point to documentation is very good.

However, things are improving rapidly, the Julia developers are fantastic, the community is inclusive and pleasant. I am looking forward to the day when the 3 language problem → Julia.

best, js

Huh, I googled “julia student-t” and got to Univariate Distributions which is where Distributions.TDist is documented.

It’s true that this is not in the Julia manual itself, but that’s inherent to not having a single monolithic library, so I’m not sure what you are asking for here.

As I wrote in another thread

I think part of the issue here is that in an extremely well-established language like Python, there are large organizations with deep pockets that pour resources into a few high-profile packages in key problem areas. […] it is easier to identify a well-resourced solution.

That being said, there’s no question that more documentation could be written, and better documentation, and I agree that having more examples in the documentation is desirable.

7 Likes

You put your finger on the problem. I was looking for student-t random numbers. You found the distribution page not the random number page.

After spending some time, one figures out that one needs to use both the distribution page and the random number page together. Google only takes you to one. So you end up with 1/2 of the solution.

these pages don’t link to each other even if both are needed for student-t random numbers. (and most other distributions)

It would be so much easier if a) these 2 pages linked to each other and b) there was some language to effect of “this is how you sample a student-t”.

In R it is rt(). found immediately. And if you need quantiles its qt, dt for density.

p.s. The Julia t RNG is much faster than the R t RNG. I don’t know If I should be worried or happy

1 Like

I agree with you, I am one that understands things by examples, and for a new user of Julia, in particular, the implicit knowledge of multiple dispatch that those pages assume may be a high barrier. It is not absolutely obvious from those pages that this is the way to obtain a sample of any of those distributions. It should just be written everywhere:

julia> using Distributions

julia> rand(TDist(1), 3)
3-element Vector{Float64}:
 -3.671459303633777
 -0.7487087877100667
 -1.254954738247079
7 Likes

Do you besides just calling Pkg.pin over all dependencies?

I really think that it’s worthwhile if you’re considering Julia to do some work on “getting” the basic Julia architecture… Multiple dispatch is really really important for Julia, and creating structs that represent a “thing” that you use to identify which operation you want done by a function is super common.

If you get this basic idea, then

rand(TDist(2),3)

is a pretty obvious thing to do, it means “generate random numbers with the distribution TDist(3) and give me a vector of 3 of them” … this idiom is more or less the same kind of thing as

rt, rgamma, rnorm, rbeta, rexp

in R. R uses the name of the distribution in the function name and prefixes with r to get random numbers, or q to get quantiles, or p to get probability points, or d to get density…

Julia uses things like

pdf(TDist(1),2)
logpdf(TDist(1),2)
quantile(TDist(1),0.2)
cdf(TDist(2),1.0)

And to get different distributions you simply hand it a different distribution object…

This is such a central idea in Julia, that before you can really criticize Julia you should really have learned this basic idea. On the other hand, I think it’s entirely right to criticize tutorials and introductory material if it doesn’t hammer home this idea very early on in the examples/learning process.

4 Likes

Yes, it is an order of magnitude better - but I agree that it can be difficult to navigate. In part because the docs explain the details, but not user workflows. The Manifest.toml is a true snapshot of everything and given Julia’s superb installation of binaries it is very dependable relative to python (even with virtual environments I find it very difficult to have good reproducibility).

Some notes are here: 2. Introductory Examples — Quantitative Economics with Julia which show some of the basics. A good heuristic is that unless you are writing a package which will be used by downstream code, you pretty much always want both a Project.toml and a Manifest.toml in your github repo, and you should always have one associated with every project. For beginners, keeping almost nothing in your (1.7) environment outside of development tools will make things much easier and reproducible.

The broader issue is that Optim may not be the package people want to use at this point. Documentation is necessary but not sufficient to compete and the package quality/completness also matters. Optim is great it doesn’t have anything close to the maintenance support of its competitors on Python/Matlab/C/etc. Others such as NLopt might be better baseline choice and if GitHub - SciML/Optimization.jl: Mathematical Optimization in Julia. Local, global, gradient-based and derivative-free. Linear, Quadratic, Convex, Mixed-Integer, and Nonlinear Optimization in one simple, fast, and differentiable interface. builds out then it could be a good place for consolidated docs.

Not surprising. Despite the hard work of many, Julia has a long way to go for things like manipulating data - but also why use Julia for those purposes? R is tough to beat for the things it is best at, as is Python for things like cookie-cutter deep learning. I even think Stata is the right choice for many situations (especially given that it has a half-decent online package management system and network effects for applied microeconomists).

In general I like your suggestions in the writeup about some languages, including Julia, being the best language in certain circumstances, but not sure the criteria to conclude that R is the best overall language. That is a statement on the sorts of applications you are using moreso than the language itself. For almost every use case I have, R would be a non-starter, and the choice is matlab, python, or fortran.

Agree that there are a lot of use cases where R is a non-starter. That said, we have a project where estimation takes hours on a multi-core machine and R is as fast as Julia and slightly slower than c.
But the real benefit of R is a) that it has so many high quality libraries and b) it’s much easier to use (via Rstdio and good documentation) than the alternatives.

2 Likes

Sorry, I was not criticising multiple dispatch, I see the benefit.

All said was that in this particular case it is really hard to find our how to do something important and simple.

Even if technically distributions and random numbers belong in a separate part of the code base, they belong together in the documentation.

And for distributions docs not link to random numbers and vice versa, cannot be a good idea.

Did you create an issue in Issues · JuliaStats/Distributions.jl · GitHub and in Issues · JuliaRandom/RandomNumbers.jl · GitHub so that these issues can be fixed?

2 Likes

Well no, I have only filed issues when I find bugs. And it is not obvious to me how to frame this issue. (is is an issue with distributions, random numbers or julia docs in general?)

But yes, ranting on discourse about documentation is not optimal either.

1 Like

Please, also file issues about the documentation. Even better are pull requests. I filed for example this issue: Improve documentation of filter() · Issue #69 · sl-solution/InMemoryDatasets.jl · GitHub and this resulted in a very fruitful discussion and hopefully soon in a pull request.

And if you are unsure in which package to file an issue do it in one of the packages, and if the package authors complain do it in the other one. Or just in both. Not much work.

Note that users don’t get to the language because they goal is to learn a language, nor because they are studying computer science concept in abstract.

The language is fine and the concept is powerful, it is the user guides that should just provide examples. Explaining in abstract what multiple dispatch is doesn’t help at all before one (I’m talking about me) has seen a few examples.

6 Likes

will do,

1 Like

As a fully separate issue, this was the first time I saw InMemoryDatasets. What is the benefit compared to DataFrames? GitHub - sl-solution/InMemoryDatasets.jl: Multithreaded package for working with tabular data in Julia is very careful not to compare the two, and I can see why. I would love to see to see a dataframe type object that sacrifies performance for flexibility.