Suppose I have the following CSV files.
df1 = DataFrame(A = 1:5, B = ["M", "F", "F", "M", "F"])
df2 = DataFrame(A = 11:15, B = ["A", "B", "C", "D", "E"])
df3 = DataFrame(A = 16:21, B = ["A", "B", "C", "D", "E","T"])
CSV.write("folder/df-1.csv",df1)
CSV.write("folder/df-2.csv", df2)
CSV.write("folder/df-3.csv", df3)
fls = glob("*.csv", "folder")
I want to combine them into a new DataFrame while creating a new column that contains just the first part of the filename i.e. new column would just something like
 "df"
 "df"
 "df"
 "df"
 ...   
as @nilshg points out here I can create a new column with the filenames using the source kwarg
DF1 = CSV.read(fls, DataFrame; source = "nwcol" => split.(basename.(fls),'-'), pool = false)
16Γ3 DataFrame
 Row β A      B        nwcol                             
     β Int64  String1  Arrayβ¦                            
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ
   1 β     1  M        SubString{String}["df", "1", "20β¦
   2 β     2  F        SubString{String}["df", "1", "20β¦
   3 β     3  F        SubString{String}["df", "1", "20β¦
   4 β     4  M        SubString{String}["df", "1", "20β¦
   5 β     5  F        SubString{String}["df", "1", "20β¦
   6 β    11  A        SubString{String}["df", "2", "20β¦
   7 β    12  B        SubString{String}["df", "2", "20β¦
   8 β    13  C        SubString{String}["df", "2", "20β¦
   9 β    14  D        SubString{String}["df", "2", "20β¦
  10 β    15  E        SubString{String}["df", "2", "20β¦
  11 β    16  A        SubString{String}["df", "3", "20β¦
  12 β    17  B        SubString{String}["df", "3", "20β¦
  13 β    18  C        SubString{String}["df", "3", "20β¦
  14 β    19  D        SubString{String}["df", "3", "20β¦
  15 β    20  E        SubString{String}["df", "3", "20β¦
  16 β    21  T        SubString{String}["df", "3", "20β¦
However if I try to create the column by taking the first element of each substring via  first.(split.(basename.(fls),'-')) .  The following error is returned even though pool = false.
DF2 = CSV.read(fls, DataFrame; source = "nwcol" => first.(split.(basename.(fls),'-')), pool = false)
ERROR: BoundsError: attempt to access 1-element Vector{SubString{String}} at index [3]
Stacktrace:
 [1] setindex!
   @ ./array.jl:1021 [inlined]
 [2] setindex!
   @ ./multidimensional.jl:698 [inlined]
 [3] _invert(d::Dict{SubString{String}, UInt32})
   @ PooledArrays ~/.julia/packages/PooledArrays/Vy2X0/src/PooledArrays.jl:26
 [4] PooledArray
   @ ~/.julia/packages/PooledArrays/Vy2X0/src/PooledArrays.jl:87 [inlined]
 [5] CSV.File(sources::Vector{String}; source::Pair{String, Vector{SubString{String}}}, kw::@Kwargs{pool::Bool})
   @ CSV ~/.julia/packages/CSV/tmZyn/src/file.jl:941
 [6] File
   @ ~/.julia/packages/CSV/tmZyn/src/file.jl:901 [inlined]
 [7] read(source::Vector{String}, sink::Type; copycols::Bool, kwargs::@Kwargs{source::Pair{String, Vector{β¦}}, pool::Bool})
   @ CSV ~/.julia/packages/CSV/tmZyn/src/CSV.jl:117
 [8] top-level scope
   @ REPL[321]:1
Some type information was truncated. Use `show(err)` to see complete types.
I realize it is trivial to just set DF1.nwcol = first.(DF1.nwcol) But just trying to understand why it returns an error with source kwarg.  Is first.() calling pooled arrays and is there a way to turn it off?