Is it possible in CSV.jl, DelimitedFiles.jl, or other, to read text files that contain both spaces & tabs, in the header and/or data sections? An example is provided below. NB: the hidden spaces and tabs should be there after copy and paste, also added row with missing value Col1 Col2 Col3 1 …

readdlm does so by default? julia> using DelimitedFiles readdlm("/tmp/testfile.txt") 3×3 Matrix{Any}: "Col1" "Col2" "Col3" 1 1 12 2 1 13

@GunnarFarneback , thanks for your response. The result is a matrix of Any that needs further parsing in order to use the convenience tools in DataFrames and DataFramesMeta to handle missings, etc. (my input example was too simple). The following code (adapted from here ) converts the matrix of Any i…

Gunnar, this is really brilliant. No clue how you managed to implement the replacement operation using the IOBuffer. The docs mention such intermediate operations only vaguely. Your example should be part of the docs, IMHO. Thanks again.

There’s nothing deep going on here. If split into intermediate results: a = read(file); # Read a full file into a UInt8 vector b = replace(a, UInt8('\t') => UInt8(' ')) # Replace tabs by spaces c = IOBuffer(b) # IOBuffer may optionally opera…

So, the file is actually read “twice”? Once into IOBuffer memory and then again from the memory buffer with CSV.read()?

Unless you’re sure the file is ASCII, you should probably do a = read(file, String) b = replace(a, '\t' => ' ') c = IOBuffer(b)

@cjdoris , how do you plug-it in the one-liner above? Get error: ERROR: MethodError: no method matching findnext(::UInt8, ::String, ::Int64)

Can you show us what you tried?

It reads it more than two times, since replace also generate intermediate array. If you want to avoid it, then you should better use map! a = read(file) map!(c -> c == UInt8('\t') ? UInt8(' ') : c, a, a) df = CSV.read(IOBuffer(a), header=1, delim=" ", ignorerepeated=true, type=Int64, DataFrame)

@cjdoris , sorry messed it up here. It works perfectly :slight_smile:

Reading data text files delimited with both spaces & tabs

General Usage

GunnarFarneback July 18, 2021, 12:34pm 4

You can do the replacement in Julia without too much fuzz:

df = CSV.read(IOBuffer(replace(read("/tmp/testfile.txt"), UInt8('\t') => UInt8(' '))), header=1, delim=" ", ignorerepeated=true, type=Int64, DataFrame)

Reading a txt. file with two different delimiters

Problem with CSV.jl reading csv file with extra spaces

Topic		Replies	Views
Reading text data: `readdlm` is deprecated, so how CSV package is used? General Usage question , csv , io , delimitedfiles , text-data	9	1206	July 17, 2024
DelimitedFiles reading everything in one column New to Julia	1	371	January 15, 2021
How to repleace some chars in a HTTP --> CSV --> DataFrame workflow? General Usage dataframes , csv , http	4	493	March 30, 2021
How to read .tsv files in julia New to Julia	2	614	August 18, 2023
Reading a txt. file with two different delimiters General Usage csv	5	2273	May 15, 2022

Reading data text files delimited with both spaces & tabs

Related topics