Split .txt file into several DataFrames

Hello :grin:,
I would like to load a .txt file and split it inot several DataFrames .
The file has the following general structure:

#Name:
#Id:
#Dosename:
#RoiName:brain
#Roi volume
#Unit: Gy
0.000 100.000
0.290 89.0
0.580 67.8
0.870 55.0
1.161 43.1
1.451 21.3

#RoiName:neck
#River volume 
#Unit: Gy
0.000 100.000
0.081 89.1
0.162 68.3
0.243 56.9

The idea would be to split the file at each “#RoiName:” and make a several DataFrames :

brain_df =
x y
0.000 100.000
0.290 89.0
0.580 67.8
0.870 55.0
1.161 43.1
1.451 21.3

neck_df =
x y
0.000 100.000
0.081 89.1
0.162 68.3
0.243 56.9

I tried to load my .txt file with CSV.jl and then to convert it with DataFrames.jl, but no idea to “split” it has described above.

Thanks in advance ! :grin:

using DataFrames
using CSV
input = """
#Name:
#Id:
#Dosename:
#RoiName:brain
#Roi volume
#Unit: Gy
0.000 100.000
0.290 89.0
0.580 67.8
0.870 55.0
1.161 43.1
1.451 21.3

#RoiName:neck
#River volume 
#Unit: Gy
0.000 100.000
0.081 89.1
0.162 68.3
0.243 56.9
"""
io = IOBuffer(input)
dfs = DataFrame[]
buffer = String[]
for line in eachline(io)
    if !startswith(line, "#")
        if isempty(line)
            if !isempty(buffer)
                push!(dfs, CSV.read(IOBuffer(join(buffer, "\n")), DataFrame, header=["x", "y"]))
                empty!(buffer)
            end
        else
            push!(buffer, line)
        end
    end
end
if !isempty(buffer)
    push!(dfs, CSV.read(IOBuffer(join(buffer, "\n")), DataFrame, header=["x", "y"]))
    empty!(buffer)
end
1 Like

Alternatively, using readuntil:

julia> open("sample.txt") do f
           df = Dict{String,DataFrame}()
           readuntil(f, "#RoiName:")
           while !eof(f)
               df[readline(f)] = CSV.read(IOBuffer(readuntil(f, "#RoiName:")), DataFrame; 
                   comment="#", header=["x", "y"])
           end
           df
       end

Dict{String, DataFrame} with 2 entries:
  "brain" => 6×2 DataFrame…
  "neck"  => 4×2 DataFrame…
3 Likes

@stillyslalom, to run your nice code in Julia 1.8.5, I need to split the inner loop assignment as follows:

str = readline(f)
df[str] =  ...

Is this a new feature in Julia 1.9?

Nope, just an erroneous simplification on my part - the LHS gets evaluated after the RHS. This works:

julia> open("sample.txt") do f
                  df = Dict{String,DataFrame}()
                  readuntil(f, "#RoiName:")
                  while !eof(f)
                      name, rest = readline(f), IOBuffer(readuntil(f, "#RoiName:"))
                      df[name] = CSV.read(rest, DataFrame; comment="#", header=["x", "y"])
                  end
                  df
              end
Dict{String, DataFrame} with 2 entries:
  "brain" => 6×2 DataFrame…
  "neck"  => 4×2 DataFrame…
2 Likes

Thank you very much for this solution!