Thanks for the input. The data can be found in the zip file here: https://www2.census.gov/programs-surveys/acs/data/pums/2017/5-Year/csv_pus.zip
It is the first .csv file in the .zip folder. The code that I’m using to load it looks like this (and only takes a few minutes to execute):
using JuliaDB
acs = loadtable("psam_pusa.csv", type_detect_rows=200)
This results in the following output:
Table with 4691835 rows, 286 columns:
Columns:
# colname type
─────────────────────────────────────
1 RT String
2 SERIALNO Int64
3 DIVISION Int64
4 SPORDER Int64
5 PUMA Int64
6 REGION Int64
7 ST Int64
8 ADJINC Int64
9 PWGTP Int64
10 AGEP Int64
11 CIT Int64
12 CITWP Union{Missing, Int64}
13 COW Union{Missing, Int64}
14 DDRS Union{Missing, Int64}
15 DEAR Int64
16 DEYE Int64
17 DOUT Union{Missing, Int64}
18 DPHY Union{Missing, Int64}
19 DRAT Union{Missing, Int64}
20 DRATX Union{Missing, Int64}
21 DREM Union{Missing, Int64}
22 ENG Union{Missing, Int64}
23 FER Union{Missing, Int64}
24 GCL Union{Missing, Int64}
25 GCM Union{Missing, Int64}
26 GCR Union{Missing, Int64}
27 HINS1 Int64
28 HINS2 Int64
29 HINS3 Int64
30 HINS4 Int64
31 HINS5 Int64
32 HINS6 Int64
33 HINS7 Int64
34 INTP Union{Missing, Int64}
35 JWMNP Union{Missing, Int64}
36 JWRIP Union{Missing, Int64}
37 JWTR Union{Missing, Int64}
38 LANX Union{Missing, Int64}
39 MAR Int64
40 MARHD Union{Missing, Int64}
41 MARHM Union{Missing, Int64}
42 MARHT Union{Missing, Int64}
43 MARHW Union{Missing, Int64}
44 MARHYP Union{Missing, Int64}
45 MIG Union{Missing, Int64}
46 MIL Union{Missing, Int64}
47 MLPA Union{Missing, Int64}
48 MLPB Union{Missing, Int64}
49 MLPCD Union{Missing, Int64}
50 MLPE Union{Missing, Int64}
51 MLPFG Union{Missing, Int64}
52 MLPH Union{Missing, Int64}
53 MLPI Union{Missing, Int64}
54 MLPJ Union{Missing, Int64}
55 MLPK Union{Missing, Int64}
56 NWAB Union{Missing, Int64}
57 NWAV Union{Missing, Int64}
58 NWLA Union{Missing, Int64}
59 NWLK Union{Missing, Int64}
60 NWRE Union{Missing, Int64}
61 OIP Union{Missing, Int64}
62 PAP Union{Missing, Int64}
63 RELP Int64
64 RETP Union{Missing, Int64}
65 SCH Union{Missing, Int64}
66 SCHG Union{Missing, Int64}
67 SCHL Union{Missing, Int64}
68 SEMP Union{Missing, Int64}
69 SEX Int64
70 SSIP Union{Missing, Int64}
71 SSP Union{Missing, Int64}
72 WAGP Union{Missing, Int64}
73 WKHP Union{Missing, Int64}
74 WKL Union{Missing, Int64}
75 WKW Union{Missing, Int64}
76 WRK Union{Missing, Int64}
77 YOEP Union{Missing, Int64}
78 ANC Int64
79 ANC1P Int64
80 ANC2P Int64
81 DECADE Union{Missing, Int64}
82 DIS Int64
83 DRIVESP Union{Missing, Int64}
84 ESP Union{Missing, Int64}
85 ESR Union{Missing, Int64}
86 FOD1P Union{Missing, Int64}
87 FOD2P Union{Missing, Int64}
88 HICOV Int64
89 HISP Int64
90 INDP Union{Missing, Int64}
91 JWAP Union{Missing, Int64}
92 JWDP Union{Missing, Int64}
93 LANP Union{Missing, Int64}
94 MIGPUMA Union{Missing, Int64}
95 MIGSP Union{Missing, Int64}
96 MSP Union{Missing, Int64}
97 NAICSP String
98 NATIVITY Int64
99 NOP Union{Missing, Int64}
100 OC Union{Missing, Int64}
101 OCCP Union{Missing, Int64}
102 PAOC Union{Missing, Int64}
103 PERNP Union{Missing, Int64}
104 PINCP Union{Missing, Int64}
105 POBP Int64
106 POVPIP Union{Missing, Int64}
107 POWPUMA Union{Missing, Int64}
108 POWSP Union{Missing, Int64}
109 PRIVCOV Int64
110 PUBCOV Int64
111 QTRBIR Int64
112 RAC1P Int64
113 RAC2P Int64
114 RAC3P Int64
115 RACAIAN Int64
116 RACASN Int64
117 RACBLK Int64
118 RACNH Int64
119 RACNUM Int64
120 RACPI Int64
121 RACSOR Int64
122 RACWHT Int64
123 RC Union{Missing, Int64}
124 SCIENGP Union{Missing, Int64}
125 SCIENGRLP Union{Missing, Int64}
126 SFN Union{Missing, Int64}
127 SFR Union{Missing, Int64}
128 SOCP String
129 VPS Union{Missing, Int64}
130 WAOB Int64
131 FAGEP Int64
132 FANCP Int64
133 FCITP Int64
134 FCITWP Int64
135 FCOWP Int64
136 FDDRSP Int64
137 FDEARP Int64
138 FDEYEP Int64
139 FDISP Int64
140 FDOUTP Int64
141 FDPHYP Int64
142 FDRATP Int64
143 FDRATXP Int64
144 FDREMP Int64
145 FENGP Int64
146 FESRP Int64
147 FFERP Int64
148 FFODP Int64
149 FGCLP Int64
150 FGCMP Int64
151 FGCRP Int64
152 FHICOVP Int64
153 FHINS1P Int64
154 FHINS2P Int64
155 FHINS3C Union{Missing, Int64}
156 FHINS3P Int64
157 FHINS4C Union{Missing, Int64}
158 FHINS4P Int64
159 FHINS5C Union{Missing, Int64}
160 FHINS5P Int64
161 FHINS6P Int64
162 FHINS7P Int64
163 FHISP Int64
164 FINDP Int64
165 FINTP Int64
166 FJWDP Int64
167 FJWMNP Int64
168 FJWRIP Int64
169 FJWTRP Int64
170 FLANP Int64
171 FLANXP Int64
172 FMARP Int64
173 FMARHDP Int64
174 FMARHMP Int64
175 FMARHTP Int64
176 FMARHWP Int64
177 FMARHYP Int64
178 FMIGP Int64
179 FMIGSP Int64
180 FMILPP Int64
181 FMILSP Int64
182 FOCCP Int64
183 FOIP Int64
184 FPAP Int64
185 FPERNP Int64
186 FPINCP Int64
187 FPOBP Int64
188 FPOWSP Int64
189 FPRIVCOVP Int64
190 FPUBCOVP Int64
191 FRACP Int64
192 FRELP Int64
193 FRETP Int64
194 FSCHGP Int64
195 FSCHLP Int64
196 FSCHP Int64
197 FSEMP Int64
198 FSEXP Int64
199 FSSIP Int64
200 FSSP Int64
201 FWAGP Int64
202 FWKHP Int64
203 FWKLP Int64
204 FWKWP Int64
205 FWRKP Int64
206 FYOEP Int64
207 PWGTP1 Int64
208 PWGTP2 Int64
209 PWGTP3 Int64
210 PWGTP4 Int64
211 PWGTP5 Int64
212 PWGTP6 Int64
213 PWGTP7 Int64
214 PWGTP8 Int64
215 PWGTP9 Int64
216 PWGTP10 Int64
217 PWGTP11 Int64
218 PWGTP12 Int64
219 PWGTP13 Int64
220 PWGTP14 Int64
221 PWGTP15 Int64
222 PWGTP16 Int64
223 PWGTP17 Int64
224 PWGTP18 Int64
225 PWGTP19 Int64
226 PWGTP20 Int64
227 PWGTP21 Int64
228 PWGTP22 Int64
229 PWGTP23 Int64
230 PWGTP24 Int64
231 PWGTP25 Int64
232 PWGTP26 Int64
233 PWGTP27 Int64
234 PWGTP28 Int64
235 PWGTP29 Int64
236 PWGTP30 Int64
237 PWGTP31 Int64
238 PWGTP32 Int64
239 PWGTP33 Int64
240 PWGTP34 Int64
241 PWGTP35 Int64
242 PWGTP36 Int64
243 PWGTP37 Int64
244 PWGTP38 Int64
245 PWGTP39 Int64
246 PWGTP40 Int64
247 PWGTP41 Int64
248 PWGTP42 Int64
249 PWGTP43 Int64
250 PWGTP44 Int64
251 PWGTP45 Int64
252 PWGTP46 Int64
253 PWGTP47 Int64
254 PWGTP48 Int64
255 PWGTP49 Int64
256 PWGTP50 Int64
257 PWGTP51 Int64
258 PWGTP52 Int64
259 PWGTP53 Int64
260 PWGTP54 Int64
261 PWGTP55 Int64
262 PWGTP56 Int64
263 PWGTP57 Int64
264 PWGTP58 Int64
265 PWGTP59 Int64
266 PWGTP60 Int64
267 PWGTP61 Int64
268 PWGTP62 Int64
269 PWGTP63 Int64
270 PWGTP64 Int64
271 PWGTP65 Int64
272 PWGTP66 Int64
273 PWGTP67 Int64
274 PWGTP68 Int64
275 PWGTP69 Int64
276 PWGTP70 Int64
277 PWGTP71 Int64
278 PWGTP72 Int64
279 PWGTP73 Int64
280 PWGTP74 Int64
281 PWGTP75 Int64
282 PWGTP76 Int64
283 PWGTP77 Int64
284 PWGTP78 Int64
285 PWGTP79 Int64
286 PWGTP80 Int64
If I save this via the save
function, it results in the output file being 8.5 GB (the original .csv is 4.3 GB). Then, I try to apply a simple filter such as this:
filtered = filter(row -> row.WKHP > 40, acs)
I let this run for about 10 - 15 minutes and it doesn’t finish. I noticed that my memory usage skyrockets as soon as I start loading the data and it doesn’t drop much after the data is loaded but I tried to use the Lazy package, as shown in the docs, but I got the same end result.