How to select dataframe column by name?

Hello,
I have a dataframe and I am selecting row based on a condition. How can I select a column based on its name?
Here I can get column X but the result is a string instead of a numeric vector. If I use the names of the column not enclosed in brackets it does not work, as expected because the object X is not defined.
What is the correct way?
Thank you

julia> df[df.Bact .== "A. vinelandii", :]
18×3 DataFrame
 Row │ Bact           X                  Y                
     │ String         String             String           
─────┼────────────────────────────────────────────────────
   1 │ A. vinelandii  0.050561797752809  1314042485.29793
   2 │ A. vinelandii  0.252808988764045  1224431687.48933
   3 │ A. vinelandii  0.502808988764045  1038388615.87881
   4 │ A. vinelandii  0.932584269662921  657651643.874837
   5 │ A. vinelandii  1.19662921348315   403011067.968018
   6 │ A. vinelandii  1.89606741573034   147820324.502811
   7 │ A. vinelandii  2.21629213483146   100946201.037493
   8 │ A. vinelandii  2.82865168539326   80524951.2089802
   9 │ A. vinelandii  0.039325842696629  79770165.0793892
  10 │ A. vinelandii  0.235955056179775  99530230.5872667
  11 │ A. vinelandii  0.5                232302118.140757
  12 │ A. vinelandii  0.943820224719101  615695604.542686
  13 │ A. vinelandii  1.19662921348315   949522176.82357
  14 │ A. vinelandii  1.37921348314607   1236017272.81974
  15 │ A. vinelandii  1.69101123595506   1450621774.81555
  16 │ A. vinelandii  1.88202247191011   1471259098.32493
  17 │ A. vinelandii  2.1938202247191    1370926953.01064
  18 │ A. vinelandii  2.79213483146067   1464347573.09895

julia> df[df.Bact .== "A. vinelandii", "X"]
18-element Vector{String}:
 "0.050561797752809"
 "0.252808988764045"
 "0.502808988764045"
 "0.932584269662921"
 "1.19662921348315"
 "1.89606741573034"
 "2.21629213483146"
 "2.82865168539326"
 "0.039325842696629"
 "0.235955056179775"
 "0.5"
 "0.943820224719101"
 "1.19662921348315"
 "1.37921348314607"
 "1.69101123595506"
 "1.88202247191011"
 "2.1938202247191"
 "2.79213483146067"

julia> df[df.Bact .== "A. vinelandii", X]
ERROR: UndefVarError: X not defined
Stacktrace:
 [1] top-level scope
   @ none:1

PS: I see that the dataframe is already an all-string one, so it is the way I used the correct method? Then I have to reconvert the array to numeric. Is there a way to tell Julia to read numbers from a spreadsheet directly?

How are you creating this dataframe? If you’re loading it from a csv file using CSV.jl then it should automatically detect any columns of numerical data and parse them as the correct type.

I am loading with:

using DataFrames, CSV
df = CSV.read("Data.tsv", DataFrame;
    delim='\t', missingstring="NA", decimal=',', copycols=true)

It’s possible that the fields X and Y have entries that aren’t parsed as numerical data in the source tsv file. If you try to force CSV to parse a column as a float, it should identify any bad rows in a warning message.

df = CSV.read(
    "Data.tsv", DataFrame;
    delim='\t', 
    missingstring="NA", 
    decimal=',',  
    types=Dict(:X => Float64, :Y => Float64)
)
2 Likes

Select a column with df[:, :col] or df[:, "col"].

4 Likes

Looks like the CSV file is using . as a decimal separator, so probably passing decimal = ',' to CSV.read causes the parsing of floating point numbers to fail. Did you try passing decimal = '.' or omitting that setting altogether?

2 Likes

Ops, the , was a typo. Thanks!