CSV.read not recognizing "select" keyword

CopyOfA · June 1, 2022, 10:25pm

I am reading in a space-delimited file using the CSV library in Julia.

edgeList = CSV.read(
    joinpath(dataDirectory, "out.file"),
    types=[Int, Int],
    header=["node1", "node2"],
    skipto=3,
    select=[1,2]
)

This yields the following error:

MethodError: no method matching CSV.File(::String; types=DataType[Int64, Int64], header=["node1", "node2"], skipto=3, select=[1, 2])
Closest candidates are:
  CSV.File(::Any; header, normalizenames, datarow, skipto, footerskip, limit, transpose, comment, use_mmap, ignoreemptylines, missingstrings, missingstring, delim, ignorerepeated, quotechar, openquotechar, closequotechar, escapechar, dateformat, decimal, truestrings, falsestrings, type, types, typemap, categorical, pool, strict, silencewarnings, threaded, debug, parsingdebug, allowmissing) at /Users/n.jordanjameson/.julia/packages/CSV/4GOjG/src/CSV.jl:221 got unsupported keyword argument "select"

I am using Julia v. 1.6.2. Here is the output versioninfo():

Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-5650U CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

The version of CSV is 0.10.4. The wiki for this version of CSV is here: Reading · CSV.jl, and it has a select / drop entry.

nalimilan · June 2, 2022, 7:27am

AFAICT you’re not using CSV.jl 0.10.4 as you think, as the path .julia/packages/CSV/4GOjG/ doesn’t match the ID used by that version, and there’s no line 222 in CSV.jl in that release: CSV.jl/CSV.jl at v0.10.4 · JuliaData/CSV.jl · GitHub

What does ]st CSV print?

CopyOfA · June 2, 2022, 1:58pm

Here is the output of ]st CSV:

      Status `~/.julia/environments/graph_env/Project.toml`
  [336ed68f] CSV v0.10.4

CopyOfA · June 2, 2022, 2:08pm

It looks like I’m encountering a new problem now, after restarting Julia. I am using a virtual environment here.

using Pkg

Pkg.activate("~/.julia/environments/graph_env/")

Pkg.status()

      Status `~/.julia/environments/graph_env/Project.toml`
  [336ed68f] CSV v0.10.4
  [a93c6f00] DataFrames v1.3.4
  [31c24e10] Distributions v0.25.62
  [38e38edf] GLM v1.8.0
  [28b8d3ca] GR v0.64.3
  [a2cc645c] GraphPlot v0.5.2
  [86223c79] Graphs v1.7.0
  [7073ff75] IJulia v1.23.3
  [093fc24a] LightGraphs v1.3.5
  [91a5bcdd] Plots v1.29.0
  [92933f4c] ProgressMeter v1.7.2
  [fc66bc1b] SNAPDatasets v0.2.0
  [ea10d353] WeakRefStrings v1.4.2
  [ade2ca70] Dates

The error comes from using CSV:

using CSV

┌ Info: Precompiling CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]
└ @ Base loading.jl:1342
ERROR: LoadError: LoadError: UndefVarError: PosLen not defined
Stacktrace:
  [1] top-level scope
    @ ~/.julia/packages/WeakRefStrings/31nkb/src/poslenstrings.jl:6
  [2] include(mod::Module, _path::String)
    @ Base ./Base.jl:386
  [3] include(x::String)
    @ WeakRefStrings ~/.julia/packages/WeakRefStrings/31nkb/src/WeakRefStrings.jl:1
  [4] top-level scope
    @ ~/.julia/packages/WeakRefStrings/31nkb/src/WeakRefStrings.jl:547
  [5] include
    @ ./Base.jl:386 [inlined]
  [6] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::String)
    @ Base ./loading.jl:1235
  [7] top-level scope
    @ none:1
  [8] eval
    @ ./boot.jl:360 [inlined]
  [9] eval(x::Expr)
    @ Base.MainInclude ./client.jl:446
 [10] top-level scope
    @ none:1
in expression starting at /Users/n.jordanjameson/.julia/packages/WeakRefStrings/31nkb/src/poslenstrings.jl:6
in expression starting at /Users/n.jordanjameson/.julia/packages/WeakRefStrings/31nkb/src/WeakRefStrings.jl:1
ERROR: LoadError: Failed to precompile WeakRefStrings [ea10d353-3f73-51f8-a26c-33c1cb351aa5] to /Users/n.jordanjameson/.julia/compiled/v1.6/WeakRefStrings/jl_4mHO8D.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IOContext{Base.PipeEndpoint}, internal_stdout::IOContext{Base.PipeEndpoint}, ignore_loaded_modules::Bool)
    @ Base ./loading.jl:1385
  [3] compilecache(pkg::Base.PkgId, path::String)
    @ Base ./loading.jl:1329
  [4] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1043
  [5] require(uuidkey::Base.PkgId)
    @ Base ./loading.jl:936
  [6] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:923
  [7] include
    @ ./Base.jl:386 [inlined]
  [8] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
    @ Base ./loading.jl:1235
  [9] top-level scope
    @ none:1
 [10] eval
    @ ./boot.jl:360 [inlined]
 [11] eval(x::Expr)
    @ Base.MainInclude ./client.jl:446
 [12] top-level scope
    @ none:1
in expression starting at /Users/n.jordanjameson/.julia/packages/CSV/jFiCn/src/CSV.jl:1
Failed to precompile CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b] to /Users/n.jordanjameson/.julia/compiled/v1.6/CSV/jl_SF4MfC.

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IJulia.IJuliaStdio{Base.PipeEndpoint}, internal_stdout::IJulia.IJuliaStdio{Base.PipeEndpoint}, ignore_loaded_modules::Bool)
   @ Base ./loading.jl:1385
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1329
 [4] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1043
 [5] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:936
 [6] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:923
 [7] eval
   @ ./boot.jl:360 [inlined]
 [8] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1116

nalimilan · June 2, 2022, 4:53pm

Hmm, again it seems that the printed versions of packages are not the ones that are actually used. We’ve seen this in the past (in particular related to CSV), but it’s not clear to me why. Are you running Julia in Atom maybe?

CopyOfA · June 2, 2022, 5:10pm

What’s odd here is that when I use IJulia; notebook(), I get this error, but if I use Julia from the terminal, I do not. And this happens repeatedly. I’ve exited all terminal windows, restarted terminal, restarted julia, then executed the commands:

using IJulia
notebook()

After the Jupyter notebook is open, I execute the code:

using Pkg
Pkg.activate("~/.julia/environments/graph_env")

using CSV

and get an error. However, if I only open a terminal, start Julia, and execute the same commands as above, there is no problem. Is there something with IJulia that makes it reach back into the base environment for packages?

nalimilan · June 2, 2022, 9:14pm

Yes I think this happens from time to time with IJulia. Maybe others will be able to tell you more.

(BTW, please mention when you cross-post to avoid having people replying twice without seeing previous messages. This one is also at Julia CSV.read not recognizing "select" keyword - Stack Overflow)

jd-foster · June 2, 2022, 10:03pm

IJulia doesn’t automatically use your currently active environment when you launch from the REPL, but the kernel that is activated in the Jupyter notebook session.

See

I have a pull request that updates the docs as:

If an existing Project.toml file is not found then, by default, an IJulia notebook will try to run a Julia kernel
with its active project set from the global or default environment (usually of the form ~/.julia/environments/v1.x).
If the IJulia package is not installed in that environment, then the Julia kernel selected by default will not be able to
connect, and a Connection failed error will be displayed. In this case, users should install a additional
Julia kernel that uses their chosen Julia environment.
For example, if the desired environment is currently activated in the REPL then one possibility is to execute

IJulia.installkernel("Julia MyProjectEnv", "--project=$(Base.active_project())")

and subsequently select the kernel starting with Julia MyProjectEnv from Kernel > Change Kernel in the menu of the Jupyter notebook.

CopyOfA · June 2, 2022, 10:17pm

But in this case I am specifically activating the environment with Pkg.activate("~/.Julia/environments/graph_env"), right? You can see in the printout of Pkg.status() that I have IJulia installed in this environment as well.

jd-foster · June 3, 2022, 3:42am

Good observation. Maybe you can remove the duplicate packages in the environment you launch in. What is Pkg.status() done in Jupyter before you activate graph_env?

CopyOfA · June 3, 2022, 10:46pm

Just using the Julia 1.6.2 kernel:

using Pkg

Pkg.status()

      Status `~/.julia/environments/v1.6/Project.toml`
  [336ed68f] CSV v0.5.23
  [324d7699] CategoricalArrays v0.7.7
  [8be319e6] Chain v0.4.10
  [a81c6b42] Compose v0.8.2
  [a93c6f00] DataFrames v0.18.4
  [c91e804a] Gadfly v1.2.1
  [cd3eb016] HTTP v0.8.19
  [7073ff75] IJulia v1.23.3
  [43edad99] InstantiateFromURL v0.6.0
  [682c06a0] JSON v0.21.1
  [22d8b318] OAuth v0.7.1
  [91a5bcdd] Plots v1.29.0
  [ce6b1742] RDatasets v0.7.7
  [92393bbf] Twitter v0.8.1
  [f8ef4a19] VirtualEnv v1.0.0

And then I activate a different environment:

Pkg.activate("~/.julia/environments/graph_env/")

  Activating environment at `~/.julia/environments/graph_env/Project.toml`

Pkg.status()

      Status `~/.julia/environments/graph_env/Project.toml`
  [336ed68f] CSV v0.10.4
  [a93c6f00] DataFrames v1.3.4
  [31c24e10] Distributions v0.25.62
  [38e38edf] GLM v1.8.0
  [28b8d3ca] GR v0.64.3
  [a2cc645c] GraphPlot v0.5.2
  [86223c79] Graphs v1.7.0
  [7073ff75] IJulia v1.23.3
  [093fc24a] LightGraphs v1.3.5
  [91a5bcdd] Plots v1.29.0
  [92933f4c] ProgressMeter v1.7.2
  [fc66bc1b] SNAPDatasets v0.2.0
  [ea10d353] WeakRefStrings v1.4.2
  [ade2ca70] Dates

You’ll notice that the hash for the libraries that are in both the base environment and “graph_env” are the same. I’m not sure why this is the case.

jd-foster · June 6, 2022, 1:40am

Ok, after some digging. This looks like a known issue:
https://github.com/JuliaData/CSV.jl/issues/912

It comes down to the situation described here:
https://github.com/JuliaLang/julia/issues/35663

I have a few recommendations to get around this. Basically, you should make sure you activate your working environment as the very first thing you do in the script, before any using or import package statements. Also, update your packages in your base environment (v1.6) to the latest possible versions.

Finally, if you are using the graph_env environment, make sure your IJulia kernel in your workbook is using it as the default:

IJulia.installkernel("Julia GraphEnv", "--project=GraphEnvProjectPath")

where GraphEnvProjectPath is the path to your project folder, and select this kernel for your workbook.
Alternatively, launch your Jupyter notebook session in the REPL from that project environment folder (Using IJulia · IJulia);
julia --project=~/.julia/environments/graph_env

Also, are you using VirtualEnv.jl somewhere? I’m not familiar with this package but it might mess with some of this native environment management.

Topic		Replies	Views
Julia CSV.read stopped working Data csv	32	2799	April 30, 2022
Issues with CSV in Julia 1.6.2 Data csv	35	4561	September 29, 2021
CSV.jl fails precompling, TypeError: in Type{...} expression, expected UnionAll, got Type{Parsers.Options} General Usage question , csv	19	3637	December 8, 2021
Using CSV issues General Usage	13	396	March 19, 2020
Reverting back to avoid precompile errors? New to Julia question	26	1995	September 1, 2021

CSV.read not recognizing "select" keyword

Related topics