Choosing a numerical programming language for economic research: Julia,

Note however that the Distributions docs list things you can do with the package on the landing page, where the first point listed is “sampling from a distribution”.

Then there’s the “getting started” section next, which shows (using the example of a standard normal distribution) how to create a distribution object, get its mean, draw random numbers from it (!) and calculate quantiles.

https://juliastats.org/Distributions.jl/stable/starting/

So I would argue that anyone who manages to work out that at distribution is implemented in the Distributions package should be able to work out in under 5 minutes how to achieve the things that most users are likely looking for in the package.

Of course any package documentation is incomplete on the sense that it doesn’t necessarily show every specific use case of every user on every page of the documentation, but conciseness is also a value in and of itself. In your case it seems your main issue was that you were expecting the docstring of every distribution to include examples of how to apply methods to that distribution. I’m ambivalent on this (personally I think taking 2 minutes to look at the getting started section to understand the package I’m using isn’t unreasonable, on the other hand adding a simple rand example to the docstring isn’t exactly overkill) but you could just open a PR to add those two lines and see what the package maintainers say.

1 Like

Well, on the same page you have: Sampling (Random number generation), describing how to use rand and distributions together.

You need to scroll a bit, but it’s on the same page.

Since documentation is on people’s minds again, I will note that people who are unhappy with documentation can actually step up to provide a lot of value. For example, if you want to see workflows documented, a good thing to do would be to write up 10-100 specific problems you’d like to see solved and then ask others to produce the solutions. This forum is very reliable at producing solutions, but I seldom see people systematically drive collating questions and solutions into usable documentation.

16 Likes

Indeed, the getting started section is very good. The issue (in this case then) is just that we live in a world that google drives what people reads first. It is not obvious to someone that reaches that page that there is a much better place to read the documentation. IMHO, redundancy in explaining things is hardly ever bad.

To add to this, the getting started page gets to this point immediately.

1 Like

Is there a “best practices for documentation”?

Sometimes package developers forget to do more hand-holding.
Eg I added some details to the Julia VSCode readme.
1 install Julia (link)
2 install VSCode (link)
3 open VSCode…
Handholding makes a difference.

5 Likes

Would it not be good to add a brief sentence to julia/index.md at master · JuliaLang/julia · GitHub mentioning that random numbers from distributions other than those mentioned (e.g., uniform, normal, etc.) can be obtained by using the Distributions package? I would be happy to make a PR, but I’m just not certain what is the policy about mentioning packages in the main Julia documentation.

EDIT: I went ahead and made a PR: Random numbers from other distributions by mcreel · Pull Request #46342 · JuliaLang/julia · GitHub

11 Likes

For sure. There are many tasks where R is superior and I wouldn’t even consider julia (though might choose Stata or python). My point is that we should all be careful making unconditional statements ranking languages. They should always be conditional on a particular class of tasks, as you do in your conclusions talking about Julia but not R.

There are many tasks where Julia (or C++, Python, Rust, R, Matlab, Fortran, …) doesn’t compare favorably to the others. When people trying to choose a language read about how painful Julia is for a particular task and how great R is, it can conflate a whole bunch of different use-cases. Similarly, if people hear how awesome Julia is and try it just for vanilla datacleaning and linear regressions they are going to come out of it with a pretty low opinion of the language and ecosystem.

Unconditional statements ranking languages are more a statement on the frequency of author’s fields most common tasks (i.e., some weighting of task-conditional rankings).

1 Like

Well, I would say that Julia superior to all the other languages you mentioned when it comes to the effort that is needed to write a new, performant package.

Therefore I think that the package ecosystem of Julia will evolve much faster that the ecosystem of any other language…

3 Likes

My gsoc student had this problem a couple of months ago and I encouraged them to open an issue:

which has been already fixed.

I’m missing how this package is related to anything.

Linear regression is child’s play in Julia, no? ‘Data cleaning’ otoh is unfamiliar territory for me. What sort of tools appear to be missing?

Or do you miss documentation at Random Numbers · The Julia Language ?

Then you have to file an issue at Issues · JuliaLang/julia · GitHub

I don’t miss that. My point is that package you mentioned isn’t relevant at all.

I thought you were missing a reference to Distributions in the documentation of a random numbers package…

An undisclosed benefit of Julia is that most of Julia’s functionality is written in Julia. Anyone who works with the Julia language can also contribute to its further development. This should bring even more dynamism and quality to the libraries in the future.

2 Likes

I don’t even remotely think this is true. CSV, DataFrames and DataFrameMeta, and either GLM or Turing are basically far superior to anything in any of the more common languages. And I spent 20 yrs doing that stuff in R.

The difference is that I have computer science background and a lot of people who do that sort of stuff don’t. A lot of people have learned some incantations but not really fundamentals. Those people will struggle no matter what they do when they get outside their incantation scope.

5 Likes

It depends very much on what exactly you were doing before Julia.

I do most of my data work in STATA.

Some data cleaning tasks (generating variables) are much easier in STATA.
Some tasks are much easier in Dataframes.jl.

Some econometrics tasks are much easier in FixedEffectsModels.jl, some in STATA.

PS: ALL of the things that are easier (for me) in STATA can be eventually implemented in Julia.

1 Like

Can you give an example of something that is very easy in STATA that you struggle with in Julia?

At risk of digressing:

2 Likes

I’m curious how long it is expected that that will take. Most applied micro people appear to use R or Stata. I’m wondering how far away Julia is from those for panel data and causal inference econometrics or whatever other uses draws these people to R/Stata. Are there people working on these things with the goal of making Julia compete with R/Stata in these areas?

1 Like