Query on histc and histogram counts


#1

Dear Julia Users,

I want to count the number of times a zip code appears in a vector and found the following topic:

I ran the following code:

using StatsBase
using DataFrames
df1 = readtable("Test2.csv")

Out[4]:
irn
1	43752
2	43752
3	43752
4	43752
5	43752
6	43752
7	43752
8	43752
9	43752
10	43752
11	43752
12	43752
13	43752
14	43752
15	43752
16	43752
17	43752
18	43752
19	43752
20	43752
21	43752
22	43752
23	43752
24	43752
25	43752
26	43752
27	43752
28	43752
29	43752
30	43752
⋮	⋮
results = fit(Histogram, df1[:,1])
StatsBase.Histogram{Int64,1,Tuple{FloatRange{Float64}}}
edges:
  43500.0:500.0:50500.0
weights: [3411,3197,882,640,16,2573,0,799,0,0,0,0,0,729]
closed: right

However, the above does not give me the correct counts, which I verified in MATLAB.
I then ran the following code:

results = fit(Histogram, df1[:,1], unique(df1[:,1]))

StatsBase.Histogram{Int64,1,Tuple{DataArrays.DataArray{Int64,1}}}
edges:
  [43752,43851,44008,44081,44107,44214,44230,44271,44289,44313  …  47340,47365,47373,47381,47399,50419,50427,50435,50450,50468]
weights: [133,214,345,818,215,31,299,154,92,862  …  43,65,358,128,37,87,321,221,95,5]
closed: right

This code does give me the correct counts for all zip codes but the first one, i.e. 43752, which should have a count os 3278.

I’m at a loss as to why it doesn’t produce the correct counts for the first zip code.

If anyone can provide help it would be greatly appreciated.

Sincerely,
Donald Lacombe


#2

Histograms are not suited/overkill for counting data: they are made for continuous variables. Use the counting functions from StatsBase or the FreqTables package.


#3

I don’t think a histogram is what you want. Zip code is a categorical variable even though it disguises as numeric

try

using StatsBase
countmap(mynumbers)

for example

using StatsBase
mynumbers = [1,1,1,2,2,2,2,2,2]
countmap(mynumbers)

#4

Thank you for the information regarding the countmap function.

If you do not mind, I have another question.

I did the following and the results are below:

using StatsBase
mynumbers = [1,1,1,2,2,2,2,2,2]
result = countmap(mynumbers)

Dict{Int64,Int64} with 2 entries:
2 => 6
1 => 3

I then sorted the results as follows:

for key in sort(collect(keys(result)))
println("key => (result[key])")
end

Which resulted in the following:

1 => 3
2 => 6

Is there a way in which I can assign the numbers 3 and 6 into a vector? I need these numbers for other calculations. I tried the following:

v = values(result)

However, I need them to be sorted in the exact order as above.

Thank you again for yor assistance.


#5
julia> [result[key] for key in sort(collect(keys(result)))]
2-element Array{Int64,1}:
 3
 6

#6

I want to thank you for the quick reply to my query!

This is exactly the functionality that I was looking for and will enable me to complete the code I have been working on.

Again, thank you and all of the members of the Julia community.

Sincerely,
Donald