Why do some functions not output the result directly?

Well yes, it is actual output and I’m guessing that what makes Julia so powerful is the fact that one object can be used in so many different ways. Which is why I’m saying it makes sense after you understand it.

In R, you could just do ?rbinom and the documentation would be good enough. Or you could Google it and quickly see a live example of someone using it in an analogous way to what you want. And in R there’s already so many tutorials for different things that it’s easy to triangulate on what’s going on.

It seems to be a bit different for Julia and although I still found out about how you can use it with rand with Google, it was a bit opaque. I get the impression that I have to actually read the manual, but sometimes the manual feels like it was written by people who already understand what’s going on. This isn’t to say that R’s documentation is very good, but that their community already has a treasure trove of things to Google through.


My first language was R, quite a few years ago. It helps that in R basically everything is a dataframe! In Julia… anything can be anything. I agree with you that the learning curve of julia is steeper. However, where usually people point out it’s because of a lack of documentation, I tend to disagree (I started at 0.4 when documentation was really lacking). I think it’s simply because there’s much more to know; the possibilities are much greater, and the “span” that julia has from high level to low level is much wider.

I do suggest reading the manual. And I suggest reading it “cover-to-cover” because you’re likely to discover a huge number of things (even general programming things that have nothing to do with julia) that you’ve never heard of. There are also a number of “julia by example” sources online which can help you get started. Other than that, I suggest reading source code. When you don’t know what something like Binomial does, run

@edit Binomial(1)

in the REPL and julia will take you to the source code (there is also @less). Digging around in well established repos will accelerate your learning 10-fold.


The difference between R and Julia here is that Julia represents the distribution itself as a first-class thing, whereas in R what is documented as “The Binomial Distribution” is merely a bunch of functions that do various binomal-related things. What connects the functions dbinom, pbinom, qbinom, rbinom in R? Absolutely nothing, except a naming convention and the fact that they’re listed together in the documentation.

Why is this a problem? For some kinds of code it’s not. But as soon as you want to do something that’s generic over distributions, it becomes almost essential to have a first-class representation of distributions as objects. To make the issue concrete, I challenge you to write a function in R that takes any distribution, samples a number of points from it and then produces a Q-Q plot comparing the sample CDF with the theoretical CDF. If you try, you’ll see that the fact that R doesn’t have a way of representing the distribution itself rapidly becomes a problem. Here is such a generic function in Julia:

using Distributions, UnicodePlots

function qqplot(
    d :: UnivariateDistribution;
    samples :: Integer = 10^6,
    points :: Integer = 99,
    quantiles = (0:1/(points+1):1)[2:end-1]
    reference = quantile(d, quantiles)
    values = rand(d, samples)
    sample = quantile(values, quantiles)
    lineplot(reference, sample)

This simple, clear definition works for any kind of univariate distribution:

julia> qqplot(Normal())
   3 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
  -3 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│

julia> qqplot(Beta())
   1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠂│
   0 │⣀⠤⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│

julia> qqplot(Binomial(100, 0.5))
  70 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
  30 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│

If we want to see how a low number of sample points can skew the QQ line, that’s easy too:

julia> for d in [Normal(), Beta(), Binomial(100, 0.5)]
           display(qqplot(d, samples=100))

qqplot(Normal{Float64}(μ=0.0, σ=1.0)):

    3 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│
   -2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│

qqplot(Beta{Float64}(α=1.0, β=1.0)):

   1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠄│
   0 │⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│

qqplot(Binomial{Float64}(n=100, p=0.5)):

   60 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠤⠂⠀⠀⠀⠀⠀⠀⠀│
   30 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│

You see what I did there? I looped over a vector of distribution objects and did the same thing for each one in a for loop. How would you do that in R?

This approach of having a simple (usually immutable) structure that represents the essence of some thing, which you can then do various things with, is quintessentially Julian and part of why the language allows people to write clear, highly generic code so easily. The definition of qqplot doesn’t care about the details of a distribution—it doesn’t have to worry about what parameters each one takes. All it needs to know is how to do two things:

  1. Given a distribution object, how to ask for a sample of it, and
  2. Given a distribution object, how to ask for its theoretical quantiles.

And both are easy to do without knowing anything about the distribution or its parameters.


And some of them don’t even have an end, like Iterators.countfrom(0), which counts forever.