Julia CSV.read stopped working

jjdegruijter · April 29, 2022, 9:35am

I’ve been using CSV.read very often, but yesterday it didn’t work anymore. From another thread on this issue I understand that it may be caused by incompatibility of packages, but I haven’t seen a solution yet.
I use julia-1.6.5 via Visual Studio Code version 1.66.2 on macOS Monterey.

In the beginning of my script I have:
Import Pkg
using Pkg
Pkg.add(“DataFrames”)
Pkg.add(“CSV”)
using DataFrames
using CSV
include(“ospatsmas”)
The latter refers to my function starting with:
Data=CSV.read("NowleyT.txt", DataFrame; delim=',', header=false)

“NowleyT.txt” is a data file which I often used. The first line is:
1511805.54010775,-3631401.61267572,134,14.0385102411102,1.58792636637882

At this point Julia presents a large technical text in ‘context.jl’ that doesn’t make sense to me.

I would be very gateful if somebody could lead me out of this dead end.

bkamins · April 29, 2022, 9:43am

Can you please share what status command in Package Manager mode produces (so that we can see versions of packages you have installed).

CC @quinnj

jjdegruijter · April 29, 2022, 9:54am

Status ~/.julia/environments/v1.6/Project.toml
[336ed68f] CSV v0.10.2
[a93c6f00] DataFrames v1.3.2
[31c24e10] Distributions v0.25.53
[10745b16] Statistics

nilshg · April 29, 2022, 9:55am

While Bogumil is probably right in guessing that this is a version issue (I feel like we had a discussion on here before about using project-specific environments!?), when you say

At this point Julia presents a large technical text in ‘context.jl’ that doesn’t make sense to me.

what you are probably referring to is the error message together with the so called “stacktrace”, which shows the chain of function calls triggered by your call to CSV.read and where exactly it errored. While it’s fair to say that these can look daunting to a beginner (work is ongoing to improve this), they are a key bit of information to provide on here if you want people to be able to help you, so I’d encourage you to copy/paste this into questions you ask on the forum.

nilshg · April 29, 2022, 9:56am

Try ]up - CSV, DataFrames, and Distributions are all not on the latest version.

jjdegruijter · April 29, 2022, 10:41am

Internal structure used to track information for a single column in a delimited file.

Fields:

type: always a single, concrete type; no Union{T, Missing}; missingness is tracked in anymissing field; this field is mutable; it may start as one type and get “promoted” to another while parsing; two special types exist: NeedsTypeDetection, which specifies that we need to try and detect what type this column’s values are and HardMissing which means the column type is definitely Missing and we don’t need to detect anything; to get the “final type” of a column after parsing, call CSV.coltype(col), which takes into account anymissing
anymissing: whether any missing values have been encountered while parsing; if a user provided a type like Union{Int, Missing}, we’ll set this to true, or when missing values are encountered while parsing
userprovidedtype: whether the column type was provided by the user or not; this affects whether we’ll promote a column’s type while parsing, or emit a warning/error depending on strict keyword arg
willdrop: whether we’ll drop this column from the final columnset; computed from select/drop keyword arguments; this will result in a column type of HardMissing while parsing, where an efficient parser is used to “skip” a field w/o allocating any parsed value
pool: computed from pool keyword argument; true is 1.0, false is 0.0, everything else is Float64(pool); once computed, this field isn’t mutated at all while parsing; it’s used in type detection to determine whether a column will be pooled or not once a type is detected;
columnspecificpool: if pool was provided via Vector or Dict by user, then true, other false; if false, then only string column types will attempt pooling
column: the actual column vector to hold parsed values; field is typed as AbstractVector and while parsing, we do switches on col.type to assert the column type to make code concretely typed
lock: in multithreaded parsing, we have a top-level set of Vector{Column}, then each threaded parsing task makes its own copy to parse its own chunk; when synchronizing column types/pooled refs, the task-local Column will lock(col.lock) to make changes to the parent Column; each task-local Column shares the same lock of the top-level Column
position: for transposed reading, the current column position
endposition: for transposed reading, the expected ending position for this column
“”"
mutable struct Column

fields that are copied per task when parsing

type::Type
anymissing::Bool
userprovidedtype::Bool
willdrop::Bool
pool::Union{Float64, Tuple{Float64, Int}}
columnspecificpool::Bool

lazily/manually initialized fields

column::AbstractVector

per top-level column fields (don’t need to copy per task when parsing)

lock::ReentrantLock
position::Int
endposition::Int
options::Parsers.Options

Column(type::Type, anymissing::Bool, userprovidedtype::Bool, willdrop::Bool, pool::Union{Float64, Tuple{Float64, Int}}, columnspecificpool::Bool) =
new(type, anymissing, userprovidedtype, willdrop, pool, columnspecificpool)
end

function Column(type::Type, options::Union{Parsers.Options, Nothing}=nothing)
T = nonmissingtypeunlessmissingtype(type)
col = Column(type === Missing ? HardMissing : T,
type >: Missing,
type !== NeedsTypeDetection,
false, NaN, false)
if options !== nothing
col.options = options
end
return col
end

creating a per-task column from top-level column

function Column(x::Column)
@assert isdefined(x, :lock)
y = Column(x.type, x.anymissing, x.userprovidedtype, x.willdrop, x.pool, x.columnspecificpool)
y.lock = x.lock # parent and child columns share the same lock
if isdefined(x, :options)
y.options = x.options
end
# specifically don’t copy/re-use x.column; that needs to be allocated fresh per parsing task
return y
end

“”"
isvaliddelim(delim)

Whether a character or string is valid for use as a delimiter.
“”"
isvaliddelim(delim) = false
isvaliddelim(delim::Char) = delim != ‘\r’ && delim != ‘\n’ && delim != ‘\0’
isvaliddelim(delim::AbstractString) = all(isvaliddelim, delim)

“”"
checkvaliddelim(delim)

Checks whether a character or string is valid for use as a delimiter. If
delim is nothing, it is assumed that the delimiter will be auto-selected.
Throws an error if delim is invalid.
“”"
function checkvaliddelim(delim)
delim !== nothing && !isvaliddelim(delim) &&
throw(ArgumentError("invalid delim argument = ‘$(escape_string(string(delim)))’, "*
“the following delimiters are invalid: ‘\r’, ‘\n’, ‘\0’”))
end

function checkinvalidcolumns(dict, argname, ncols, names)
for (k, _) in dict
if k isa Integer
(0 < k <= ncols) || throw(ArgumentError(“invalid column number provided in $argname keyword argument: $k. Column number must be 0 < i <= $ncols as detected in the data. To ignore invalid columns numbers in $argname, pass validate=false”))
else
Symbol(k) in names || throw(ArgumentError(“invalid column name provided in $argname keyword argument: $k. Valid column names detected in the data are: $names. To ignore invalid columns names in $argname, pass validate=false”))
end
end
return
end

@noinline nonconcretetypes(types) = throw(ArgumentError(“Non-concrete types passed in types keyword argument, please provide concrete types for columns: $types”))

struct Context
transpose::Bool
name::String
names::Vector{Symbol}
rowsguess::Int
cols::Int
buf::Vector{UInt8}
datapos::Int
len::Int
datarow::Int
options::Parsers.Options
columns::Vector{Column}
pool::Union{Float64, Tuple{Float64, Int}}
downcast::Bool
customtypes::Type
typemap::Dict{Type, Type}
stringtype::StringTypes
limit::Int
threaded::Bool
ntasks::Int
chunkpositions::Vector{Int}
strict::Bool
silencewarnings::Bool
maxwarnings::Int
debug::Bool
tempfile::Union{String, Nothing}
streaming::Bool
end

user-facing function if just the context is desired

function Context(source::ValidSources;
# file options
# header can be a row number, range of rows, or actual string vector
header::Union{Integer, Vector{Symbol}, Vector{String}, AbstractVector{<:Integer}}=1,
normalizenames::Bool=false,
# by default, data starts immediately after header or start of file
datarow::Integer=-1,
skipto::Integer=-1,
footerskip::Integer=0,
transpose::Bool=false,
comment::Union{String, Nothing}=nothing,
ignoreemptyrows::Bool=true,
ignoreemptylines=nothing,
select=nothing,
drop=nothing,
limit::Union{Integer, Nothing}=nothing,
buffer_in_memory::Bool=false,
threaded::Union{Bool, Nothing}=nothing,
ntasks::Union{Nothing, Integer}=nothing,
tasks::Union{Nothing, Integer}=nothing,
rows_to_check::Integer=DEFAULT_ROWS_TO_CHECK,
lines_to_check=nothing,
# parsing options
missingstrings=String,
missingstring=“”,
delim::Union{Nothing, Char, String}=nothing,
ignorerepeated::Bool=false,
quoted::Bool=true,
quotechar::Union{UInt8, Char}=‘"’,
openquotechar::Union{UInt8, Char, Nothing}=nothing,
closequotechar::Union{UInt8, Char, Nothing}=nothing,
escapechar::Union{UInt8, Char}=‘"’,
dateformat::Union{String, Dates.DateFormat, Nothing, AbstractDict}=nothing,
dateformats=nothing,
decimal::Union{UInt8, Char}=UInt8(‘.’),
truestrings::Union{Vector{String}, Nothing}=TRUE_STRINGS,
falsestrings::Union{Vector{String}, Nothing}=FALSE_STRINGS,
stripwhitespace::Bool=false,
# type options
type=nothing,
types=nothing,
typemap::Dict=Dict{Type, Type}(),
pool=DEFAULT_POOL,
downcast::Bool=false,
lazystrings::Bool=false,
stringtype::StringTypes=DEFAULT_STRINGTYPE,
strict::Bool=false,
silencewarnings::Bool=false,
maxwarnings::Int=DEFAULT_MAX_WARNINGS,
debug::Bool=false,
parsingdebug::Bool=false,
validate::Bool=true,
)
return @refargs Context(source, header, normalizenames, datarow, skipto, footerskip, transpose, comment, ignoreemptyrows, ignoreemptylines, select, drop, limit, buffer_in_memory, threaded, ntasks, tasks, rows_to_check, lines_to_check, missingstrings, missingstring, delim, ignorerepeated, quoted, quotechar, openquotechar, closequotechar, escapechar, dateformat, dateformats, decimal, truestrings, falsestrings, stripwhitespace, type, types, typemap, pool, downcast, lazystrings, stringtype, strict, silencewarnings, maxwarnings, debug, parsingdebug, validate, false)
end

@refargs function Context(source::ValidSources,
# file options
# header can be a row number, range of rows, or actual string vector
header::Union{Integer, Vector{Symbol}, Vector{String}, AbstractVector{<:Integer}},
normalizenames::Bool,
datarow::Integer,
skipto::Integer,
footerskip::Integer,
transpose::Bool,
comment::Union{String, Nothing},
ignoreemptyrows::Bool,
ignoreemptylines::Union{Nothing, Bool},
select,
drop,
limit::Union{Integer, Nothing},
buffer_in_memory::Bool,
threaded::Union{Nothing, Bool},
ntasks::Union{Nothing, Integer},
tasks::Union{Nothing, Integer},
rows_to_check::Integer,
lines_to_check::Union{Nothing, Integer},
# parsing options
missingstrings::Union{Nothing, String, Vector{String}},
missingstring::Union{Nothing, String, Vector{String}},
delim::Union{Nothing, UInt8, Char, String},
ignorerepeated::Bool,
quoted::Bool,
quotechar::Union{UInt8, Char},
openquotechar::Union{Nothing, UInt8, Char},
closequotechar::Union{Nothing, UInt8, Char},
escapechar::Union{UInt8, Char},
dateformat::Union{Nothing, String, Dates.DateFormat, Parsers.Format, AbstractVector, AbstractDict},
dateformats::Union{Nothing, String, Dates.DateFormat, Parsers.Format, AbstractVector, AbstractDict},
decimal::Union{UInt8, Char},
truestrings::Union{Nothing, Vector{String}},
falsestrings::Union{Nothing, Vector{String}},
stripwhitespace::Bool,
# type options
type::Union{Nothing, Type},
types::Union{Nothing, Type, AbstractVector, AbstractDict, Function},
typemap::Dict,
pool::Union{Bool, Real, AbstractVector, AbstractDict, Base.Callable, Tuple},
downcast::Bool,
lazystrings::Bool,
stringtype::StringTypes,
strict::Bool,
silencewarnings::Bool,
maxwarnings::Integer,
debug::Bool,
parsingdebug::Bool,
validate::Bool,
streaming::Bool)

# initial argument validation and adjustment
@inbounds begin
((source isa AbstractString || source isa AbstractPath) && !isfile(source)::Bool) && throw(ArgumentError("\"$source\" is not a valid file or doesn't exist"))
if types !== nothing
    if types isa AbstractVector
        any(x->!concrete_or_concreteunion(x), types) && nonconcretetypes(types)
    elseif types isa AbstractDict
        typs = values(types)
        any(x->!concrete_or_concreteunion(x), typs) && nonconcretetypes(typs)
    elseif types isa Type
        concrete_or_concreteunion(types) || nonconcretetypes(types)
    end
end
checkvaliddelim(delim)
ignorerepeated && delim === nothing && throw(ArgumentError("auto-delimiter detection not supported when `ignorerepeated=true`; please provide delimiter like `delim=','`"))
if lazystrings && !streaming
    @warn "`lazystrings` keyword argument is deprecated; use `stringtype=PosLenString` instead"
    stringtype = PosLenString
end
if tasks !== nothing
    @warn "`tasks` keyword argument is deprecated; use `ntasks` instead"
    ntasks = tasks
end
if ignoreemptylines !== nothing
    @warn "`ignoreemptylines` keyword argument is deprecated; use `ignoreemptyrows` instead"
    ignoreemptyrows = ignoreemptylines
end
if lines_to_check !== nothing
    @warn "`lines_to_check` keyword argument is deprecated; use `rows_to_check` instead"
    rows_to_check = lines_to_check
end
if !isempty(missingstrings)
    @warn "`missingstrings` keyword argument is deprecated; pass a `Vector{String}` to `missingstring` instead"
    missingstring = missingstrings
end
if dateformats !== nothing
    @warn "`dateformats` keyword argument is deprecated; pass column date formats to `dateformat` keyword argument instead"
    dateformat = dateformats
end
if datarow != -1
    @warn "`datarow` keyword argument is deprecated; use `skipto` instead"
    skipto = datarow
end
if type !== nothing
    @warn "`type` keyword argument is deprecated; a single type can be passed to `types` instead"
    types = type
end
if threaded !== nothing
    @warn "`threaded` keyword argument is deprecated; to avoid multithreaded parsing, pass `ntasks=1`"
    ntasks = threaded ? Threads.nthreads() : 1
end
if header isa Integer
    if header == 1 && skipto == 1
        header = -1
    elseif skipto != -1 && skipto < header
        throw(ArgumentError("skipto row ($skipto) must come after header row ($header)"))
    end
end
if skipto == -1
    if isa(header, Vector{Symbol}) || isa(header, Vector{String})
        skipto = 0
    elseif header isa Integer
        # by default, data starts on line after header
        skipto = header + 1
    elseif header isa AbstractVector{<:Integer}
        skipto = last(header) + 1
    end
end
debug && println("header is: $header, skipto computed as: $skipto")
# getsource will turn any input into a `AbstractVector{UInt8}`
buf, pos, len, tempfile = getsource(source, buffer_in_memory)
if len > MAX_INPUT_SIZE
    throw(ArgumentError("delimited source to parse too large; must be < $MAX_INPUT_SIZE bytes"))
end
# skip over initial BOM character, if present
pos = consumeBOM(buf, pos)

oq = something(openquotechar, quotechar) % UInt8
eq = escapechar % UInt8
cq = something(closequotechar, quotechar) % UInt8
trues = truestrings === nothing ? nothing : truestrings
falses = falsestrings === nothing ? nothing : falsestrings
sentinel = missingstring === nothing ? missingstring : (isempty(missingstring) || (missingstring isa Vector && length(missingstring) == 1 && missingstring[1] == "")) ? missing : missingstring isa String ? [missingstring] : missingstring

if delim === nothing
    if source isa AbstractString || source isa AbstractPath
        filename = string(source)
        del = endswith(filename, ".tsv") ? UInt8('\t') : endswith(filename, ".wsv") ? UInt8(' ') : UInt8('\n')
    else
        del = UInt8('\n')
    end
else
    del = (delim isa Char && isascii(delim)) ? delim % UInt8 :
        (sizeof(delim) == 1 && isascii(delim)) ? delim[1] % UInt8 : delim
end
cmt = comment === nothing ? nothing : (pointer(comment), sizeof(comment))

if footerskip > 0 && len > 0
    lastbyte = buf[end]
    endpos = (lastbyte == UInt8('\r') || lastbyte == UInt8('\n')) +
        (lastbyte == UInt8('\n') && buf[end - 1] == UInt8('\r'))
    revlen = skiptorow(ReversedBuf(buf), 1 + endpos, len, oq, eq, cq, cmt, ignoreemptyrows, 0, footerskip) - 2
    len -= revlen
    debug && println("adjusted for footerskip, len = $(len + revlen - 1) => $len")
end

df = dateformat isa AbstractVector || dateformat isa AbstractDict ? nothing : dateformat
wh1 = UInt8(' ')
wh2 = UInt8('\t')
if sentinel isa Vector
    for sent in sentinel
        if contains(sent, " ")
            wh1 = 0x00
        end
        if contains(sent, "\t")
            wh2 = 0x00
        end
    end
end
headerpos = datapos = pos
if !transpose
    # step 1: detect the byte position where the column names start (headerpos)
    # and where the first data row starts (datapos)
    headerpos, datapos = detectheaderdatapos(buf, pos, len, oq, eq, cq, cmt, ignoreemptyrows, header, skipto)
    debug && println("headerpos = $headerpos, datapos = $datapos")
end
# step 2: detect delimiter (or use given) and detect number of (estimated) rows and columns
# step 3: build Parsers.Options w/ parsing arguments
if del isa UInt8
    d, rowsguess = detectdelimandguessrows(buf, headerpos, datapos, len, oq, eq, cq, cmt, ignoreemptyrows, del)
    wh1 = d == UInt(' ') ? 0x00 : wh1
    wh2 = d == UInt8('\t') ? 0x00 : wh2
    options = Parsers.Options(sentinel, wh1, wh2, oq, cq, eq, d, decimal, trues, falses, df, ignorerepeated, ignoreemptyrows, comment, quoted, parsingdebug, stripwhitespace)
elseif del isa Char
    _, rowsguess = detectdelimandguessrows(buf, headerpos, datapos, len, oq, eq, cq, cmt, ignoreemptyrows)
    options = Parsers.Options(sentinel, wh1, wh2, oq, cq, eq, del, decimal, trues, falses, df, ignorerepeated, ignoreemptyrows, comment, quoted, parsingdebug, stripwhitespace)
    d = del
elseif del isa String
    _, rowsguess = detectdelimandguessrows(buf, headerpos, datapos, len, oq, eq, cq, cmt, ignoreemptyrows)
    options = Parsers.Options(sentinel, wh1, wh2, oq, cq, eq, del, decimal, trues, falses, df, ignorerepeated, ignoreemptyrows, comment, quoted, parsingdebug, stripwhitespace)
    d = del
else
    error("invalid delim type")
end
debug && println("estimated rows: $rowsguess")
debug && println("detected delimiter: \"$(escape_string(d isa UInt8 ? string(Char(d)) : d))\"")

if !transpose
    # step 4a: if we're ignoring repeated delimiters, then we ignore any
    # that start a row, so we need to check if we need to adjust our headerpos/datapos
    if ignorerepeated
        if headerpos > 0
            headerpos = Parsers.checkdelim!(buf, headerpos, len, options)
        end
        datapos = Parsers.checkdelim!(buf, datapos, len, options)
    end

    # step 4b: generate or parse column names
    names = detectcolumnnames(buf, headerpos, datapos, len, options, header, normalizenames)
    ncols = length(names)
else
    # transpose
    rowsguess, names, positions, endpositions = detecttranspose(buf, pos, len, options, header, skipto, normalizenames)
    ncols = length(names)
    datapos = isempty(positions) ? 0 : positions[1]
end
debug && println("column names detected: $names")
debug && println("byte position of data computed at: $datapos")

# generate initial columns
# deduce initial column types/flags for parsing based on whether any user-provided types were provided or not
customtypes = Tuple{}
if types isa AbstractVector
    length(types) == ncols || throw(ArgumentError("provided `types::AbstractVector` keyword argument doesn't match detected # of columns: `$(length(types)) != $ncols`"))
    columns = Vector{Column}(undef, ncols)
    for i = 1:ncols
        col = Column(types[i], options)
        columns[i] = col
        if nonstandardtype(col.type) !== Union{}
            customtypes = tupcat(customtypes, nonstandardtype(col.type))
        end
    end
elseif types isa AbstractDict
    T = streaming ? Union{stringtype, Missing} : NeedsTypeDetection
    columns = Vector{Column}(undef, ncols)
    for i = 1:ncols
        S = getordefault(types, names[i], i, T)
        col = Column(S, options)
        columns[i] = col
        if nonstandardtype(col.type) !== Union{}
            customtypes = tupcat(customtypes, nonstandardtype(col.type))
        end
    end
    validate && checkinvalidcolumns(types, "types", ncols, names)
elseif types isa Function
    defaultT = streaming ? Union{stringtype, Missing} : NeedsTypeDetection
    columns = Vector{Column}(undef, ncols)
    for i = 1:ncols
        T = something(types(i, names[i]), defaultT)
        col = Column(T, options)
        columns[i] = col
        if nonstandardtype(col.type) !== Union{}
            customtypes = tupcat(customtypes, nonstandardtype(col.type))
        end
    end
else
    T = types === nothing ? (streaming ? Union{stringtype, Missing} : NeedsTypeDetection) : types
    if nonstandardtype(T) !== Union{}
        customtypes = tupcat(customtypes, nonstandardtype(T))
    end
    columns = Vector{Column}(undef, ncols)
    for i = 1:ncols
        col = Column(T, options)
        columns[i] = col
    end
end
if transpose
    # set column positions
    for i = 1:ncols
        col = columns[i]
        col.position = positions[i]
        col.endposition = endpositions[i]
    end
end
# check for nonstandard types in typemap
for T in values(typemap)
    if nonstandardtype(T) !== Union{}
        customtypes = tupcat(customtypes, nonstandardtype(T))
    end
end

# generate column options if applicable
if dateformat isa AbstractDict
    for i = 1:ncols
        df = getordefault(dateformat, names[i], i, nothing)
        # devdoc: if we want to add any other column-specific parsing options, this is where we'd at the logic
        # e.g. per-column sentinel, decimal, trues, falses, openquotechar, closequotechar, escapechar, etc.
        if df !== nothing
            columns[i].options = Parsers.Options(sentinel, wh1, wh2, oq, cq, eq, d, decimal, trues, falses, df, ignorerepeated, ignoreemptyrows, comment, true, parsingdebug, stripwhitespace)
        end
    end
    validate && checkinvalidcolumns(dateformat, "dateformat", ncols, names)
end

# pool keyword
finalpool = 0.0
if !streaming
    if pool isa AbstractVector
        length(pool) == ncols || throw(ArgumentError("provided `pool::AbstractVector` keyword argument doesn't match detected # of columns: `$(length(pool)) != $ncols`"))
        for i = 1:ncols
            col = columns[i]
            col.pool = getpool(pool[i])
            col.columnspecificpool = true
        end
    elseif pool isa AbstractDict
        for i = 1:ncols
            col = columns[i]
            p = getordefault(pool, names[i], i, NaN)
            if !isnan(p)
                col.pool = getpool(p)
                col.columnspecificpool = true
            end
        end
        validate && checkinvalidcolumns(pool, "pool", ncols, names)
    elseif pool isa Base.Callable
        for i = 1:ncols
            col = columns[i]
            p = pool(i, names[i])
            if p !== nothing
                col.pool = getpool(p)
                col.columnspecificpool = true
            end
        end
    else
        finalpool = getpool(pool)
        for col in columns
            col.pool = finalpool
        end
    end
end

# figure out if we'll drop any columns while parsing
if select !== nothing && drop !== nothing
    throw(ArgumentError("`select` and `drop` keywords were both provided; only one or the other is allowed"))
elseif select !== nothing
    if select isa AbstractVector{Bool}
        for i = 1:ncols
            select[i] || willdrop!(columns, i)
        end
    elseif select isa AbstractVector{<:Integer}
        for i = 1:ncols
            i in select || willdrop!(columns, i)
        end
    elseif select isa AbstractVector{Symbol} || select isa AbstractVector{<:AbstractString}
        select = map(Symbol, select)
        for i = 1:ncols
            names[i] in select || willdrop!(columns, i)
        end
    elseif select isa Base.Callable
        for i = 1:ncols
            select(i, names[i])::Bool || willdrop!(columns, i)
        end
    else
        throw(ArgumentError("`select` keyword argument must be an `AbstractVector` of `Int`, `Symbol`, `String`, or `Bool`, or a selector function of the form `(i, name) -> keep::Bool`"))
    end
elseif drop !== nothing
    if drop isa AbstractVector{Bool}
        for i = 1:ncols
            drop[i] && willdrop!(columns, i)
        end
    elseif drop isa AbstractVector{<:Integer}
        for i = 1:ncols
            i in drop && willdrop!(columns, i)
        end
    elseif drop isa AbstractVector{Symbol} || drop isa AbstractVector{<:AbstractString}
        drop = map(Symbol, drop)
        for i = 1:ncols
            names[i] in drop && willdrop!(columns, i)
        end
    elseif drop isa Base.Callable
        for i = 1:ncols
            drop(i, names[i])::Bool && willdrop!(columns, i)
        end
    else
        throw(ArgumentError("`drop` keyword argument must be an `AbstractVector` of `Int`, `Symbol`, `String`, or `Bool`, or a selector function of the form `(i, name) -> keep::Bool`"))
    end
end
debug && println("computed types are: $types")

# determine if we can use threads while parsing
limit = something(limit, typemax(Int))
minrows = min(limit, rowsguess)
nthreads = Int(something(ntasks, Threads.nthreads()))
if ntasks === nothing && !streaming && nthreads > 1 && !transpose && minrows > (nthreads * 5) && (minrows * ncols) >= 5_000
    threaded = true
    ntasks = nthreads
elseif ntasks !== nothing && ntasks > 1
    threaded = true
    if transpose
        @warn "`ntasks > 1` not supported on transposed files"
        threaded = false
        ntasks = 1
    elseif minrows < (nthreads * 5)
        @warn "`ntasks > 1` but there were not enough estimated rows ($minrows) to justify multithreaded parsing"
        threaded = false
        ntasks = 1
    end
else
    threaded = false
    ntasks = 1
end
# attempt to chunk up a file for multithreaded parsing; there's chance we can't figure out how to accurately chunk
# due to quoted fields, so threaded might get set to false
if threaded
    # when limiting w/ multithreaded parsing, we try to guess about where in the file the limit row # will be
    # then adjust our final file len to the end of that row
    # we add some cushion so we hopefully get the limit row correctly w/o shooting past too far and needing to resize! down
    # but we also don't guarantee limit will be exact w/ multithreaded parsing
    origrowsguess = rowsguess
    if limit !== typemax(Int)
        limit = Int(limit)
        limitposguess = ceil(Int, (limit / (origrowsguess * 0.8)) * len)
        newlen = [0, limitposguess, min(limitposguess * 2, len)]
        findrowstarts!(buf, options, newlen, ncols, columns, stringtype, typemap, downcast, 5)
        len = newlen[2] - 1
        origrowsguess = limit
        debug && println("limiting, adjusting len to $len")
    end
    chunksize = div(len - datapos, ntasks)
    chunkpositions = Vector{Int}(undef, ntasks + 1)
    for i = 0:ntasks
        chunkpositions[i + 1] = i == 0 ? datapos : i == ntasks ? len : (datapos + chunksize * i)
    end
    debug && println("initial byte positions before adjusting for start of rows: $chunkpositions")
    avgbytesperrow, successfullychunked = findrowstarts!(buf, options, chunkpositions, ncols, columns, stringtype, typemap, downcast, rows_to_check)
    if successfullychunked
        origbytesperrow = ((len - datapos) / origrowsguess)
        weightedavgbytesperrow = ceil(Int, avgbytesperrow * ((ntasks - 1) / ntasks) + origbytesperrow * (1 / ntasks))
        rowsguess = ceil(Int, ((len - datapos) / weightedavgbytesperrow) * 1.01)
        debug && println("single-threaded estimated rows = $origrowsguess, multi-threaded estimated rows = $rowsguess")
        debug && println("multi-threaded column types sampled as: $columns")
    else
        debug && println("something went wrong chunking up a file for multithreaded parsing, falling back to single-threaded parsing")
        threaded = false
    end
else
    chunkpositions = EMPTY_INT_ARRAY
end
if !threaded && limit < rowsguess
    rowsguess = limit
end

end # @inbounds begin
return Context(
    transpose,
    getname(source),
    names,
    rowsguess,
    ncols,
    buf,
    datapos,
    len,
    skipto,
    options,
    columns,
    finalpool,
    downcast,
    customtypes,
    typemap,
    stringtype,
    limit,
    threaded,
    ntasks,
    chunkpositions,
    strict,
    silencewarnings,
    maxwarnings,
    debug,
    tempfile,
    streaming
)

end

pfitzseb · April 29, 2022, 11:02am

Please quote your code. It’s also not quite clear how/where you got the documentation you posted.

nilshg · April 29, 2022, 11:32am

Just to be clear you are saying you are running the script in your first post, and what you are getting back is the contents of the context.jl file from the CSV package?

That seems extremely odd and is almost definitely not a version issue, but impossible to diagnose without seeing what ospatsmas is.

As an aside, you don’t need to do import Pkg; using Pkg, in this case they are the same and generally you’d only ever want to use one or the other (see the documentation here to understand what they are doing).

It’s also not good for reproducibiliy to have add DataFrames CSV in your script - this will potentially install a different version of the packages every time you run it (if new versions have been released or compat bounds change). Instead you should work in a project specific environment which you just activate when you run the script.

jjdegruijter · April 29, 2022, 11:56am

Just to be clear you are saying you are running the script in your first post, and what you are getting back is the contents of the context.jl file from the CSV package?

Yes, it is.

That seems extremely odd and is almost definitely not a version issue, but impossible to diagnose without seeing what ospatsmas is.

ospatsmas is a function that I developed myself during many years, and I’m trying to extend it further.
It’s quit large. The problem that now suddenly arose is right in the beginning:

function ospatsmas()
println(“------------------------------------------------”)
println(“-------------------- START FUNCTION OSPATSMAS —”)

SECTION 0. SYSTEMATIC SAMPLE FROM NOWLEY.TXT AS IN OSPALL, to enable working with a smaller data set

global x, y, z_pred, s2, N
Data = CSV.read(“NowleyT.txt”, DataFrame; delim=‘,’, header =false) # FOUT?
df = Data
x = df[!, 1]
y = df[!, 2]
z_pred = df[!, 4]
s2 = df[!, 5]
N = length(x)
println("Grid size : ", N)

nilshg · April 29, 2022, 12:01pm

It’s not really possible to reproduce this without access to the txt file you’re reading in, but it seems completely inconceivable to me that the lines of code you posted would lead to the contents of a package’s source code being printed in the REPL.

Are you not seeing an error message? If not, in what sense has CSV.read “stopped working”?

jjdegruijter · April 29, 2022, 12:35pm

It’s not really possible to reproduce this without access to the txt file you’re reading in, but it seems completely inconceivable to me that the lines of code you posted would lead to the contents of a package’s source code being printed in the REPL.

Are you not seeing an error message? If not, in what sense has CSV.read “stopped working”?

No output in the REPL, except for “START FUNCTION OSPATSMAS”, i.e. the very first bit of the function.

I didn’t see an error message. Only “context.jl” popped up, with all the text that I posted before.

nilshg · April 29, 2022, 1:18pm

What happens if you start a fresh Julia session and do:

using Pkg
Pkg.activate(; temp = true)
Pkg.add(["CSV", "DataFrames"])
using CSV, DataFrames
CSV.read("...some path.../NowleyT.txt", DataFrame; delim = ',', header = false)

(you need to replace ...some path... here with the correct location of your txt file)

jjdegruijter · April 29, 2022, 1:36pm

I get an then error message: “UndefVarError: CSV not defined”

nilshg · April 29, 2022, 1:59pm

In that case I’m not sure what to say… Are you sure you started a fresh Julia session and copy/pasted the exact code I posted above, replacing only the path to NowleyT.txt?

(NB I just made a small edit to the above as the delim kwarg had the wrong ticks around it - but that should have led to a different error from the one you’re reporting)

jjdegruijter · April 29, 2022, 2:26pm

After copy/pasting with the correct delim specification I get a different reaction.
The old “context.jl” again, with (maybe) a relevant part as below:

# initial argument validation and adjustment
    @inbounds begin
    ((source isa AbstractString || source isa AbstractPath) && !isfile(source)::Bool) && throw(ArgumentError("\"$source\" is not a valid file or doesn't exist"))
    if types !== nothing
        if types isa AbstractVector
            any(x->!concrete_or_concreteunion(x), types) && nonconcretetypes(types)
        elseif types isa AbstractDict
            typs = values(types)
            any(x->!concrete_or_concreteunion(x), typs) && nonconcretetypes(typs)
        elseif types isa Type
            concrete_or_concreteunion(types) || nonconcretetypes(types)
        end

I must admit that I have not inserted the complete path to “NowleyT.txt” because that file is in the same folder as the function ospatsmas(), and I didn’t have to specify a path to a file in the folder before

nilshg · April 29, 2022, 2:48pm

How are you using Julia? Do you have a startup.jl file? What happens if you remove any arguments from CSV.read, i.e. just call

CSV.read()

as the last line of my test script above?

jjdegruijter · April 29, 2022, 4:39pm

Same respose: CSV not defined.
I have no startup.jl file.

nilshg · April 29, 2022, 5:53pm

Either something is seriously broken with your Julia install or you are not actually executing the code you think you’re executing. I can’t see how the following code:

using Pkg
Pkg.activate(; temp = true)
Pkg.add(["CSV", "DataFrames"])
using CSV, DataFrames
CSV.read()

when copied into a fresh Julia 1.6.5 session can produce UndefVarError: CSV not defined.

Can you please run the above code snippet in a fresh Julia session and copy the entire terminal output from the way you start Julia in a new terminal up to the error, like this:

nils@nils-2560p ~]$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.1 (2021-12-22)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Pkg

julia> Pkg.activate(; temp = true)
  Activating new project at `/tmp/jl_RXyJ2C`

julia> Pkg.add(["CSV", "DataFrames"])
    Updating registry at `~/.julia/registries/General.toml`
(...)
    Updating `/tmp/jl_RXyJ2C/Project.toml`
  [336ed68f] + CSV v0.10.4
  [a93c6f00] + DataFrames v1.3.3
    Updating `/tmp/jl_RXyJ2C/Manifest.toml`
(...)
Precompiling project...
  14 dependencies successfully precompiled in 38 seconds (17 already precompiled)

julia> using CSV, DataFrames

julia> CSV.read()
ERROR: MethodError: no method matching read()
Closest candidates are:
  read(::Any) at ~/.julia/packages/CSV/jFiCn/src/CSV.jl:87
  read(::Any, ::Any; copycols, kwargs...) at ~/.julia/packages/CSV/jFiCn/src/CSV.jl:87
Stacktrace:
 [1] top-level scope

(note I’ve removed the list of packages which get added to the Manifest.toml file for brevity)

jjdegruijter · April 29, 2022, 6:15pm

This is what I got:
julia> Pkg.activate(; temp = true)
Activating new environment at /var/folders/s1/mfq8vx4d63539vgg51kjcms00000gn/T/jl_vCeF68/Project.toml

julia> Pkg.add([“CSV”, “DataFrames”])
Updating registry at ~/.julia/registries/General
Resolving package versions…
Updating /private/var/folders/s1/mfq8vx4d63539vgg51kjcms00000gn/T/jl_vCeF68/Project.toml
[336ed68f] + CSV v0.10.4
[a93c6f00] + DataFrames v1.3.3
Updating /private/var/folders/s1/mfq8vx4d63539vgg51kjcms00000gn/T/jl_vCeF68/Manifest.toml
[336ed68f] + CSV v0.10.4
[944b1d66] + CodecZlib v0.7.0
[34da2185] + Compat v3.43.0
[a8cc5b0e] + Crayons v4.1.1
[9a962f9c] + DataAPI v1.10.0
[a93c6f00] + DataFrames v1.3.3
[864edb3b] + DataStructures v0.18.11
[e2d170a0] + DataValueInterfaces v1.0.0
[48062228] + FilePathsBase v0.9.18
[59287772] + Formatting v0.4.2
[842dd82b] + InlineStrings v1.1.2
[41ab1584] + InvertedIndices v1.1.0
[82899510] + IteratorInterfaceExtensions v1.0.0
[e1d29d7a] + Missings v1.0.2
[bac558e1] + OrderedCollections v1.4.1
[69de0a69] + Parsers v2.3.1
[2dfb63ee] + PooledArrays v1.4.1
[08abe8d2] + PrettyTables v1.3.1
[189a3867] + Reexport v1.2.2
[91c51154] + SentinelArrays v1.3.12
[a2af1166] + SortingAlgorithms v1.0.1
[3783bdb8] + TableTraits v1.0.1
[bd369af6] + Tables v1.7.0
[3bb67fe8] + TranscodingStreams v0.9.6
[ea10d353] + WeakRefStrings v1.4.2
[0dad84c5] + ArgTools
[56f22d72] + Artifacts
[2a0f44e3] + Base64
[ade2ca70] + Dates
[8bb1440f] + DelimitedFiles
[8ba89e20] + Distributed
[f43a241f] + Downloads
[9fa8497b] + Future
[b77e0a4c] + InteractiveUtils
[b27032c2] + LibCURL
[76f85450] + LibGit2
[8f399da3] + Libdl
[37e2e46d] + LinearAlgebra
[56ddb016] + Logging
[d6f4376e] + Markdown
[a63ad114] + Mmap
[ca575930] + NetworkOptions
[44cfe95a] + Pkg
[de0858da] + Printf
[3fa0cd96] + REPL
[9a3f8284] + Random
[ea8e919c] + SHA
[9e88b42a] + Serialization
[1a1011a3] + SharedArrays
[6462fe0b] + Sockets
[2f01184e] + SparseArrays
[10745b16] + Statistics
[fa267f1f] + TOML
[a4e569a6] + Tar
[8dfed614] + Test
[cf7118a7] + UUIDs
[4ec0a83e] + Unicode
[deac9b47] + LibCURL_jll
[29816b5a] + LibSSH2_jll
[c8ffd9c3] + MbedTLS_jll
[14a3606d] + MozillaCACerts_jll
[83775a58] + Zlib_jll
[8e850ede] + nghttp2_jll
[3f19e933] + p7zip_jll

julia> using CSV, DataFrames

julia> CSV.read()
ERROR: MethodError: no method matching read()
Closest candidates are:
read(::Any) at /Users/jjdegruijter/.julia/packages/CSV/jFiCn/src/CSV.jl:87
read(::Any, ::Any; copycols, kwargs…) at /Users/jjdegruijter/.julia/packages/CSV/jFiCn/src/CSV.jl:87
Stacktrace:
[1] top-level scope
@ REPL[6]:1

nilshg · April 29, 2022, 6:45pm

Okay so the code snippet doesn’t actually return the error you said it returns - it’s really important to be precise in explaining what you’re doing and what the results are, and ideally run things in new Julia sessions to rule out that some previously loaded packages or defined variable lingers around.

Now repeat the same process again and replace the last line with

CSV.read("...some path.../NowleyT.txt", DataFrame; delim = ',', header = false)

again, but please do enter the actual full path to your file.

Topic		Replies	Views
CSV.read stopped working Data	5	1133	March 5, 2018
Pkg error ...HELP General Usage	6	400	March 26, 2020
Using CSV issues General Usage	13	383	March 19, 2020
CSV won't read after dependency issues New to Julia	7	432	June 8, 2020
Issues with CSV in Julia 1.6.2 Data csv	35	4457	September 29, 2021

Julia CSV.read stopped working

fields that are copied per task when parsing

lazily/manually initialized fields

per top-level column fields (don’t need to copy per task when parsing)

creating a per-task column from top-level column

user-facing function if just the context is desired

SECTION 0. SYSTEMATIC SAMPLE FROM NOWLEY.TXT AS IN OSPALL, to enable working with a smaller data set

Related topics