Choosing a numerical programming language for economic research: Julia,

nilshg · August 14, 2022, 10:51am

Note however that the Distributions docs list things you can do with the package on the landing page, where the first point listed is “sampling from a distribution”.

Then there’s the “getting started” section next, which shows (using the example of a standard normal distribution) how to create a distribution object, get its mean, draw random numbers from it (!) and calculate quantiles.

https://juliastats.org/Distributions.jl/stable/starting/

So I would argue that anyone who manages to work out that at distribution is implemented in the Distributions package should be able to work out in under 5 minutes how to achieve the things that most users are likely looking for in the package.

Of course any package documentation is incomplete on the sense that it doesn’t necessarily show every specific use case of every user on every page of the documentation, but conciseness is also a value in and of itself. In your case it seems your main issue was that you were expecting the docstring of every distribution to include examples of how to apply methods to that distribution. I’m ambivalent on this (personally I think taking 2 minutes to look at the getting started section to understand the package I’m using isn’t unreasonable, on the other hand adding a simple rand example to the docstring isn’t exactly overkill) but you could just open a PR to add those two lines and see what the package maintainers say.

DNF · August 14, 2022, 11:25am

Well, on the same page you have: Sampling (Random number generation), describing how to use rand and distributions together.

You need to scroll a bit, but it’s on the same page.

johnmyleswhite · August 14, 2022, 11:28am

Since documentation is on people’s minds again, I will note that people who are unhappy with documentation can actually step up to provide a lot of value. For example, if you want to see workflows documented, a good thing to do would be to write up 10-100 specific problems you’d like to see solved and then ask others to produce the solutions. This forum is very reliable at producing solutions, but I seldom see people systematically drive collating questions and solutions into usable documentation.

lmiq · August 14, 2022, 11:48am

Indeed, the getting started section is very good. The issue (in this case then) is just that we live in a world that google drives what people reads first. It is not obvious to someone that reaches that page that there is a much better place to read the documentation. IMHO, redundancy in explaining things is hardly ever bad.

jishnub · August 14, 2022, 12:08pm

To add to this, the getting started page gets to this point immediately.

Albert_Zevelev · August 14, 2022, 2:10pm

Is there a “best practices for documentation”?

Sometimes package developers forget to do more hand-holding.
Eg I added some details to the Julia VSCode readme.
1 install Julia (link)
2 install VSCode (link)
3 open VSCode…
Handholding makes a difference.

mcreel · August 14, 2022, 2:25pm

Would it not be good to add a brief sentence to julia/index.md at master · JuliaLang/julia · GitHub mentioning that random numbers from distributions other than those mentioned (e.g., uniform, normal, etc.) can be obtained by using the Distributions package? I would be happy to make a PR, but I’m just not certain what is the policy about mentioning packages in the main Julia documentation.

EDIT: I went ahead and made a PR: Random numbers from other distributions by mcreel · Pull Request #46342 · JuliaLang/julia · GitHub

jlperla · August 14, 2022, 4:46pm

For sure. There are many tasks where R is superior and I wouldn’t even consider julia (though might choose Stata or python). My point is that we should all be careful making unconditional statements ranking languages. They should always be conditional on a particular class of tasks, as you do in your conclusions talking about Julia but not R.

There are many tasks where Julia (or C++, Python, Rust, R, Matlab, Fortran, …) doesn’t compare favorably to the others. When people trying to choose a language read about how painful Julia is for a particular task and how great R is, it can conflate a whole bunch of different use-cases. Similarly, if people hear how awesome Julia is and try it just for vanilla datacleaning and linear regressions they are going to come out of it with a pretty low opinion of the language and ecosystem.

Unconditional statements ranking languages are more a statement on the frequency of author’s fields most common tasks (i.e., some weighting of task-conditional rankings).

ufechner7 · August 14, 2022, 5:14pm

Well, I would say that Julia superior to all the other languages you mentioned when it comes to the effort that is needed to write a new, performant package.

Therefore I think that the package ecosystem of Julia will evolve much faster that the ecosystem of any other language…

giordano · August 14, 2022, 5:32pm

My gsoc student had this problem a couple of months ago and I encouraged them to open an issue:

github.com/JuliaStats/Distributions.jl

Docs not clear on usage of Poisson Object

opened 07:54AM - 14 Jun 22 UTC

closed 07:16PM - 16 Jun 22 UTC

Aman-Pandey-afk

https://github.com/JuliaStats/Distributions.jl/blob/f889f9e56b0243d770c195b3eee8…baef4880bd2e/src/univariate/discrete/poisson.jl#L1-L22 Hi, I think the docs should be more clear on the Poisson Object that is generated when Poisson ($\lambda$) is executed. In Python we use numpy.random.poisson and give it an argument to generate an iterable over the Poisson Distribution, but here the function just takes one argument and the returned Poisson object has only one field ( $\lambda$ itself). I later found out that the object can be used as an argument for rand to generate the iterable. A newcomer might get stuck as there is no help regarding the nature of the Poisson Object in Docs. I think a line like this in the docs might work: `The Poisson Object is of type UnivariateDistribution which can be passed to the rand() function as an argument along with an integer N, to generate an iterable of size N.` I will make a pull request if the issue is considerable.

which has been already fixed.

I’m missing how this package is related to anything.

DNF · August 14, 2022, 5:49pm

Linear regression is child’s play in Julia, no? ‘Data cleaning’ otoh is unfamiliar territory for me. What sort of tools appear to be missing?

ufechner7 · August 14, 2022, 6:30pm

Or do you miss documentation at Random Numbers · The Julia Language ?

Then you have to file an issue at Issues · JuliaLang/julia · GitHub …

giordano · August 14, 2022, 6:58pm

I don’t miss that. My point is that package you mentioned isn’t relevant at all.

ufechner7 · August 14, 2022, 7:01pm

I thought you were missing a reference to Distributions in the documentation of a random numbers package…

strickek · August 14, 2022, 7:29pm

An undisclosed benefit of Julia is that most of Julia’s functionality is written in Julia. Anyone who works with the Julia language can also contribute to its further development. This should bring even more dynamism and quality to the libraries in the future.

dlakelan · August 15, 2022, 1:28am

I don’t even remotely think this is true. CSV, DataFrames and DataFrameMeta, and either GLM or Turing are basically far superior to anything in any of the more common languages. And I spent 20 yrs doing that stuff in R.

The difference is that I have computer science background and a lot of people who do that sort of stuff don’t. A lot of people have learned some incantations but not really fundamentals. Those people will struggle no matter what they do when they get outside their incantation scope.

Albert_Zevelev · August 15, 2022, 3:03am

It depends very much on what exactly you were doing before Julia.

I do most of my data work in STATA.

Some data cleaning tasks (generating variables) are much easier in STATA.
Some tasks are much easier in Dataframes.jl.

Some econometrics tasks are much easier in FixedEffectsModels.jl, some in STATA.

PS: ALL of the things that are easier (for me) in STATA can be eventually implemented in Julia.

dlakelan · August 15, 2022, 3:05am

Can you give an example of something that is very easy in STATA that you struggle with in Julia?

Albert_Zevelev · August 15, 2022, 3:15am

At risk of digressing:

github.com/kleinschmidt/RegressionFormulae.jl

Great Idea

opened 06:41PM - 28 Apr 20 UTC

azev77

Hi @kleinschmidt, I think the Julia ecosystem would benefit from something like… this! If we wanna do serious stats it should be easy to automatically generate all interactions (up order n) etc. Some things I find particularly useful in my other stats packages outside Julia: 1. "i.x1" makes x1 into a [factor variable](https://www.stata.com/features/overview/factor-variables/) in a formula Suppose x1 takes the values: 1.2, 5, 6.4 `reg y x1`: treats x1 as continuous & returns 1 coef (assuming no intercept) `reg y i.x1` creates 3 dummies for each level of x1 & returns 3 coefficients (if there is an intercept it randomly drops one level unless the user chooses which level to drop) 2. `i.x1#(c.x2 i.x3)` Interacts all dummies of x1 w/ x2 (continuous) Interacts all dummies of x1 w/ all dummies of x3 3. Leads & Lags. Suppose D is at the state-year level. `L.D`: creates a 1 year lag of D `L(4).D`: creates a 4 year lag of D `F(4).D`: creates a 4 year lead of D. $D_{t+4}$ `reg y F(-1 0 1 2).D` estimates: y_t =b_{-1} x_{t-1} + b_{0} x_{t} +b_{1} x_{t+1} +b_{2} x_{t+2} If Julia is to be "[as easy for statistics as R](https://julialang.org/blog/2012/02/why-we-created-julia/)" these features should be in StatsModels. I'd love to help if I can.

samerb · August 15, 2022, 3:34am

I’m curious how long it is expected that that will take. Most applied micro people appear to use R or Stata. I’m wondering how far away Julia is from those for panel data and causal inference econometrics or whatever other uses draws these people to R/Stata. Are there people working on these things with the goal of making Julia compete with R/Stata in these areas?

Topic		Replies	Views
Material for discussing differences/advantages between Julia, R, Python, Matlab, and C Community question	4	2011	April 18, 2017
A Comparison of Programming Languages in Economics Community	62	6189	August 7, 2018
Julia vs (R/STATA/Matlab/SAS/Python) for common data analysis tasks (2022 edition) General Usage data	6	2457	September 1, 2022
Latest wisdom on Python vs. Julia for Economists (Applied Micro)? Finance and Economics question	8	3340	December 17, 2020
Data Science for Managers: Programming Languages Offtopic	11	1540	December 2, 2019

Choosing a numerical programming language for economic research: Julia,

Related topics