For a CSV file, I am using a combination of Pipe
, HTTP
and CSV
to download, (eventually) modify the file and import it as a DataFrame without any temporary writing on disk:
urlData = "https://github.com/sylvaticus/IntroSPMLJuliaCourse/raw/main/lessonsSources/02_-_JULIA2_-_Scientific_programming_with_Julia/data.csv"
data = @pipe HTTP.get(urlData).body |>
replace!(_, UInt8(';') => UInt8(' ')) |> # if we need to apply modifications to the file before importing
CSV.File(_, delim=' ') |>
DataFrame;
How can I use the same general approach (without a temporary disk saving) when the file has been compressed with zip or tar/gz (assuming a single file in the archive) ?
For example:
urlDataZ = "https://github.com/sylvaticus/IntroSPMLJuliaCourse/raw/main/lessonsSources/02_-_JULIA2_-_Scientific_programming_with_Julia/data.zip"
urlDataT = "https://github.com/sylvaticus/IntroSPMLJuliaCourse/raw/main/lessonsSources/02_-_JULIA2_-_Scientific_programming_with_Julia/data.tgz"
Crosspost on SO: dataframe - How to use Pipe/HTTP/CSV to download, extract and import a zipped or tgz csv file from internet? - Stack Overflow