I’m using Julia 1.4.
I want to use JuliaDB.jl, specifically, to read a bunch of CSVs and combine them into one big DataFrame. Here’s the issue: When reading the CSVs, I want all columns to be parsed as String. The number of columns in each CSV differs.
Here’s what I’ve tried:
using CSV
using DataFrames # just for creating the example DataFrames
using JuliaDB
df1 = DataFrame(
[['a', 'b', 'c'], [1, 2, 3]],
["name", "id"]
)
df2 = DataFrame(
[['d', 'e', 'f'], [4, 5, 6], [11, 22, 33]],
["name", "id", "other"]
)
# For simplicity, I will read just two CSVs, but imagine 20+.
#
# Assume these CSVs are the only files returned by `readdir()`
# below.
CSV.write("df1.csv", df1)
CSV.write("df2.csv", df2)
# This works only if each CSV has the same number of columns with
# the exact same name. But I need it to work for CSVs with
# differing numbers of columns and column names. Also, this gets
# unwieldy if there are many columns.
df = loadtable(readdir(); colparsers=Dict(:name=>String, :id=>String))
# This doesn't work
df = loadtable(readdir(); colparsers=String)
# MethodError: no method matching iterate(::Type{String})
Here’s how I’d do it in R:
library(purrr) # Need dplyr installed for `map_dfr()` to work
# Assume list.files() returns just the two above-specified CSVs
df = map_dfr(list.files(), read.csv, colClasses = "character")