Data files with 'quotes' around the column names


#1

I have data files, given to me by someone else, with the following format:

'col1','col2','col3'
1,2,3
4,5,6

And so on. CSV reads them just fine, and leaves me with a DataFrame that looks like

|Row     | 'col1'  | 'col2'  | 'col3' |
_______________________________________
| 1      | 1.      | 2.      | 3.     |
| 2      | 4.      | 5.      | 6.     |

and so on. But I can’t access the columns, because the keys want to have a single-quote in them, but a single-quote is an invalid literal in a key name.

julia> data[:col1]
ERROR: KeyError: key :col1 not found
julia> data[:'col1']
ERROR: syntax: invalid character literal

Escaping the single-quote with a “\” also doesn’t work. What’s the magic incantation here?


#2

Two options:

  • you could specify ' as the quotechar when reading, this would have the effect of removing the quotes when parsing, you would do this like CSV.read(file; quotechar="'")
  • If you want to keep the single quotes for whatever reason, the right way to access the columns in the DataFrame is data[Symbol("'col1'")]

#3

Neither of these appear to work:

julia> data = CSV.read(file; quotechar="'")
ERROR: TypeError: in Type, in typeassert, expected Union{Char, UInt8}, got String

and

julia> data[Symbol("'col1'")
ERROR: KeyError: key Symbol("'col1'") not found

#4

Oops, my bad, the 1st option should be CSV.read(file; quotechar='\''). Not sure what’s going on with the 2nd case. Can you post the output of propertynames(data)?


#5
julia>  CSV.read(file; quotechar='\'')
ERROR: LoadError: TypeError: in Type, in typeassert, expected Union{Char, UInt8}, got String

julia> propertynames(data)
4-element Array{Symbol,1}:
Symbol("'frame_num'")   
Symbol(" 'col1'")    
Symbol(" 'col2'")    
Symbol(" 'col3'")

#6

Oh, wait; I made a mistake.

julia>  CSV.read(file; quotechar="\'")
ERROR: LoadError: TypeError: in Type, in typeassert, expected Union{Char, UInt8}, got String

but

julia>  CSV.read(file; quotechar='\'')

succeeds! Thanks.


#7

Glad the quotechar option worked. As for the DataFrame indexing, you can always do data[x] where x is one of the symbols in propertynames(data), so it looks like in your case, the full column name was 'col1', with a single space before the single quote character.