Dear Julia Users,
I want to count the number of times a zip code appears in a vector and found the following topic:
I ran the following code:
using StatsBase
using DataFrames
df1 = readtable("Test2.csv")
Out[4]:
irn
1 43752
2 43752
3 43752
4 43752
5 43752
6 43752
7 43752
8 43752
9 43752
10 43752
11 43752
12 43752
13 43752
14 43752
15 43752
16 43752
17 43752
18 43752
19 43752
20 43752
21 43752
22 43752
23 43752
24 43752
25 43752
26 43752
27 43752
28 43752
29 43752
30 43752
⋮ ⋮
results = fit(Histogram, df1[:,1])
StatsBase.Histogram{Int64,1,Tuple{FloatRange{Float64}}}
edges:
43500.0:500.0:50500.0
weights: [3411,3197,882,640,16,2573,0,799,0,0,0,0,0,729]
closed: right
However, the above does not give me the correct counts, which I verified in MATLAB.
I then ran the following code:
results = fit(Histogram, df1[:,1], unique(df1[:,1]))
StatsBase.Histogram{Int64,1,Tuple{DataArrays.DataArray{Int64,1}}}
edges:
[43752,43851,44008,44081,44107,44214,44230,44271,44289,44313 … 47340,47365,47373,47381,47399,50419,50427,50435,50450,50468]
weights: [133,214,345,818,215,31,299,154,92,862 … 43,65,358,128,37,87,321,221,95,5]
closed: right
This code does give me the correct counts for all zip codes but the first one, i.e. 43752, which should have a count os 3278.
I’m at a loss as to why it doesn’t produce the correct counts for the first zip code.
If anyone can provide help it would be greatly appreciated.
Sincerely,
Donald Lacombe