I am trying to pass a (small) CSV file to DataFrame without saving it to disk, but I need first to convert the tabs in spaces, as it is a bit “dirty”.
I have tried this but it doesn’t work:
urlData = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original"
data = CSV.File(replace(IOBuffer(String(HTTP.get(urlData).body)),'\t' => ' '), delim=' ',missingstring="NA", ignorerepeated=true, header=false) |> DataFrame
The part that is not working is the replace
one…
sijo
2
Here’s one way:
using Pipe: @pipe
urlData = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original"
@pipe HTTP.get(urlData).body |>
replace!(_, UInt8('\t') => UInt8(' ')) |>
CSV.File(_, delim=' ', missingstring="NA", ignorerepeated=true, header=false) |>
DataFrame
(I don’t think you can call replace
on an IOBuffer
.)
1 Like
It’s weird that delim='\t'
doesn’t work. So maybe consider filing an issue.
sijo
4
It looks like the data has space-separated columns except for the last one which is tab-separated.
2 Likes
Exactly
This is why I need first to replace the single tab separation column… unless CSV can now specify multiple field delimiters, but I am unaware of that…