How to repleace some chars in a HTTP --> CSV --> DataFrame workflow?

I am trying to pass a (small) CSV file to DataFrame without saving it to disk, but I need first to convert the tabs in spaces, as it is a bit “dirty”.

I have tried this but it doesn’t work:

urlData = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original"
data    = CSV.File(replace(IOBuffer(String(HTTP.get(urlData).body)),'\t' => ' '), delim=' ',missingstring="NA", ignorerepeated=true, header=false) |> DataFrame

The part that is not working is the replace one…

Here’s one way:

using Pipe: @pipe

urlData = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original"

@pipe HTTP.get(urlData).body |>
    replace!(_, UInt8('\t') => UInt8(' ')) |>
    CSV.File(_, delim=' ', missingstring="NA", ignorerepeated=true, header=false) |> 
    DataFrame

(I don’t think you can call replace on an IOBuffer.)

1 Like

It’s weird that delim='\t' doesn’t work. So maybe consider filing an issue.

It looks like the data has space-separated columns except for the last one which is tab-separated.

2 Likes

Exactly :wink:

This is why I need first to replace the single tab separation column… unless CSV can now specify multiple field delimiters, but I am unaware of that…