Why is hist not defined after loading StatsBase?

question

#1

So this might be a really stupid question and I have the feeling that it will be. However, I have to ask it because it doesn’t make sense to me.

I have installed and loaded julia v 0.6.2 on my ubuntu linux 17.10 box. I run the following code in the REPL

using StatsBase
?hist

and I get this as output

Couldn’t find hist
Perhaps you meant hist, Chisq, hash, hcat, FDist, TDist, fit, Cint or Dict
No documentation found.

Binding hist does not exist.

So it’s telling me hist couldn’t be found but instead recommends to me that I might be looking for hist?

Follow up question:

I’m trying to achieve the julia version of the following executed in R

plot(hist(rnorm(100)))

hist

What is the Julia way of achieving this?


#2

I think this is because hist is exported by StatsBase, but not defined within the module.


#3

In StatsBase you use fit(Histogram, randn(100)) but this does not produce the actual plot (only calculates the bins and weights).

The simplest way to get the plot is:

using StatPlots
histogram(randn(100))

#4

AFAIK hist was removed from StatsBase (fit should be used).


#5

Thank you. I’ll give that a whirl!

So if this is the way to get a Histogram then what does the hist function do? Is it something completely different?


#6

Right, I just meant that the export hist should be removed :slight_smile:


#7

right - I misread your post :smile:


#8

It gives you a histogram data but does not plot it:

julia> fit(Histogram, randn(100))
StatsBase.Histogram{Int64,1,Tuple{StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}
edges:
  -3.0:1.0:4.0
weights: [3, 11, 39, 33, 11, 2, 1]
closed: left
isdensity: false

(and hist export in StatsBase is a bug that should be fixed as @fredrikekre indicated)


#9

Thanks for the help. :+1:


#10

To be exact, you just need Plots (StatPlots is a superset of Plots that includes DataFrames support and some extra recipes useful for plotting with data; but for most users it’s recommended to just use StatPlots rather than Plots anyway).
If you do want to fit a histogram with StatsBase and then plot it, you can do that too.

using Plots, StatsBase
h = fit(Histogram, randn(10000))
plot(h)

But the Plots histogram call is slightly more featured in that you can use :scott, :rice, :fd (Friedman-Diaconis, the default) and :sturges algorithms to get the bin breaks (StatPlots also offers :wand which gives a better fit to a density function but does not align breaks neatly with integer values). StatsBase only uses :sturges


#11

Very informative. Thanks for the explanation. Indeed I’m searching for the differences in Julian vs R thinking. After 20 years of C++ and R I’m a little set in my ways. Loving what I’ve seen so far though.