Index data from DelimitedFiles by name

I know there is CSV.jl, but I am wondering what the easiest way to access the data by their column names would be if I just use DelimitedFiles.jl.

My first approach would look like this, but maybe there is a more elegant solution.

using DelimitedFiles
data,header = readdlm(path,header=true)
coldata = data[:,header .== "column name"]

You can convert the header and columns to a dict with, for example:

The data:

shell> more test.dat
A B C
1 2 3
1 2 3
1 2 3
julia> data, header = readdlm("./test.dat",header=true)
([1.0 2.0 3.0; 1.0 2.0 3.0; 1.0 2.0 3.0], AbstractString["A" "B" "C"])

julia> d = Dict(header[i] => data[:,i] for i in 1:length(header))
Dict{SubString{String},Array{Float64,1}} with 3 entries:
  "B" => [2.0, 2.0, 2.0]
  "A" => [1.0, 1.0, 1.0]
  "C" => [3.0, 3.0, 3.0]

julia> d["A"]
3-element Array{Float64,1}:
 1.0
 1.0
 1.0

That copies the data. If that is not desirable, you can use dictionary only associating the header with the column indexes:

julia> hind = Dict(header[i] => i for i in 1:length(header))
Dict{SubString{String},Int64} with 3 entries:
  "B" => 2
  "A" => 1
  "C" => 3

julia> data[:,hind["A"]]
3-element Array{Float64,1}:
 1.0
 1.0

Or if you are just willing to do this once in a while:

julia> data[:, findfirst(==("A"),header)]
3-element Array{Float64,1}:
 1.0
 1.0
 1.0


2 Likes

The DIct option looks very nice.
Maybe views will remove the need to copy the data

 Dict(header[i]=>view(data,:,i) for i in 1:length(header))

or a named tuple(although the code looks a bit messy)

(;((Symbol.(header[i]),view(data,:,i)) for i in 1:length(header))...)
1 Like

indeed