Review inefficient pipeline?

Hi, I’m a new learner, trying to learn/practice data ingestion in Julia. I was wondering if anyone could review this pipeline I created to see if there’s a more efficient way to do it. I ultimately want a JuliaDB table, but in-between it gets passed from HTTP-get response to JSON string to Dict to DataFrame to Table.

https://github.com/andtheWings/IndyCrash/blob/master/IndyCrashJl.ipynb

1 Like

It seems like it would be better to construct the data column-by-column rather than row-by-row, as recommended in Getting Started · DataFrames.jl

Perhaps you could do something like this:

julia> records = data["result"]["records"];

julia> df = DataFrame()
0×0 DataFrame


julia> for field in keys(first(records))
         df[!, Symbol(field)] = [r[field] for r in records]
       end

julia> df
275×109 DataFrame. Omitted printing of 91 columns
│ Row │ TRAVDIRCDE │ OCCUPSNMB │ CONSTRUCTIND │ RESULTDRUGIND │ EJECTTRAPDESCR         │ GENDERCDE │ COUNTY_STATE │ CITYDESCR                     │ INJNATURECDE │ VEHMAKETXT │ INDEXING_NUMBER │ PROPDAMAGECDE │ WEATHERDESCR │ TOWEDIND │ DEERNMB │ INJSTATUSDESCR     │ INJLOCCDESCR        │ MANNERCOLLDESCR              │
│     │ String     │ Nothing   │ String       │ String        │ String                 │ String    │ String       │ String                        │ String       │ String     │ String          │ Union…        │ String       │ String   │ String  │ String             │ String              │ String                       │
├─────┼────────────┼───────────┼──────────────┼───────────────┼────────────────────────┼───────────┼──────────────┼───────────────────────────────┼──────────────┼────────────┼─────────────────┼───────────────┼──────────────┼──────────┼─────────┼────────────────────┼─────────────────────┼──────────────────────────────┤
│ 1   │ W          │           │ N            │               │                        │ M         │ 49           │ Indianapolis                  │ 12           │            │ 903036115001    │               │ Clear        │          │ 0       │ Incapacitating     │ Knee/Lower Leg/Foot │ Other (Explain in Narrative) │
...
1 Like

Thanks! I had to build up the loop piece by piece to understand what it was doing, hence the delayed response. Works like a charm!

Next, I’m going to try and do the same steps while iterating over a series of 10 datasets to pull from the API and make a JuliaDB table that consolidates all of them.