DataFrame select, un-nesting a column containing tuples into three columns

Hello,
I’ve been following the JuliaCon 2021 tutorial on DataFrames
https://github.com/bkamins/JuliaCon2021-DataFrames-Tutorial

I’m using Julia 1.6.2 on VSCodium after exporting the Tutorial.ipynb into tutorial.js

There’s only one instruction that didn’t work, which is about
un-nesting the :bootstrap column containing tuples into three columns
like:

statistic	estimate	boot lo	boot hi	parametric lo	parametric hi	bootstrap
String	Float64	Float64	Float64	Float64	Float64	Tuple…
1	(Intercept)	3.74896	2.68271	2.84741	2.75695	2.75695	(3.74896, 2.68271, 2.84741)
2	lnnlinc	-0.666932	0.276178	0.242326	0.258558	0.258558	(-0.666932, 0.276178, 0.242326)
...

statistic	estimate	boot lo	boot hi	parametric lo	parametric hi	estimate 2	boot lo 2	boot hi 2
String	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	(Intercept)	3.74896	2.68271	2.84741	2.75695	2.75695	3.74896	2.68271	2.84741
2	lnnlinc	-0.666932	0.276178	0.242326	0.258558	0.258558	-0.666932	0.276178	0.242326
....

The error I’m getting is:

ERROR: LoadError: ArgumentError: Unrecognized column selector: :bootstrap => ["estimate 2", "boot lo 2", "boot hi 2"]
Stacktrace:
  [1] normalize_selection(idx::DataFrames.Index, sel::Pair{Symbol, Vector{String}}, renamecols::Bool)
    @ DataFrames C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\abstractdataframe\selection.jl:175
  [2] (::DataFrames.var"#395#396"{Bool, DataFrame})(c::Pair{Symbol, Vector{String}})
    @ DataFrames .\none:0
  [3] iterate
    @ .\generator.jl:47 [inlined]
  [4] collect_to!(dest::Vector{Vector{Int64}}, itr::Base.Generator{Vector{Any}, DataFrames.var"#395#396"{Bool, DataFrame}}, offs::Int64, st::Int64)
    @ Base .\array.jl:724
  [5] collect_to_with_first!(dest::Vector{Vector{Int64}}, v1::Vector{Int64}, itr::Base.Generator{Vector{Any}, DataFrames.var"#395#396"{Bool, DataFrame}}, st::Int64)
    @ Base .\array.jl:702
  [6] collect(itr::Base.Generator{Vector{Any}, DataFrames.var"#395#396"{Bool, DataFrame}})
    @ Base .\array.jl:683
  [7] manipulate(::DataFrame, ::Any, ::Vararg{Any, N} where N; copycols::Bool, keeprows::Bool, renamecols::Bool)
    @ DataFrames C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\abstractdataframe\selection.jl:1209
  [8] select(::DataFrame, ::Any, ::Vararg{Any, N} where N; copycols::Bool, renamecols::Bool)
    @ DataFrames C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\abstractdataframe\selection.jl:847
  [9] select(::DataFrame, ::Any, ::Any, ::Vararg{Any, N} where N)
    @ DataFrames C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\abstractdataframe\selection.jl:847
 [10] top-level scope
    @ c:\Users\Tests\Downloads\Julia\DataFrames\JuliaCon2021-DataFrames-Tutorial\JuliaCon2021-DataFrames-Tutorial-main\Tutorial.jl:190
in expression starting at c:\Users\Tests\Downloads\Julia\DataFrames\JuliaCon2021-DataFrames-Tutorial\JuliaCon2021-DataFrames-Tutorial-main\Tutorial.jl:190

caused by: MethodError: no method matching getindex(::DataFrames.Index, ::Pair{Symbol, Vector{String}})
Closest candidates are:
  getindex(::DataFrames.AbstractIndex, ::Between) at C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\other\index.jl:219
  getindex(::DataFrames.AbstractIndex, ::AbstractRange{Bool}) at C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\other\index.jl:233
  getindex(::DataFrames.AbstractIndex, ::AbstractRange{Int64}) at C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\other\index.jl:201
  ...
Stacktrace:
  [1] normalize_selection(idx::DataFrames.Index, sel::Pair{Symbol, Vector{String}}, renamecols::Bool)
    @ DataFrames C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\abstractdataframe\selection.jl:172
  [2] (::DataFrames.var"#395#396"{Bool, DataFrame})(c::Pair{Symbol, Vector{String}})
    @ DataFrames .\none:0
  [3] iterate
    @ .\generator.jl:47 [inlined]
  [4] collect_to!(dest::Vector{Vector{Int64}}, itr::Base.Generator{Vector{Any}, DataFrames.var"#395#396"{Bool, DataFrame}}, offs::Int64, st::Int64)
    @ Base .\array.jl:724
  [5] collect_to_with_first!(dest::Vector{Vector{Int64}}, v1::Vector{Int64}, itr::Base.Generator{Vector{Any}, DataFrames.var"#395#396"{Bool, DataFrame}}, st::Int64)
    @ Base .\array.jl:702
  [6] collect(itr::Base.Generator{Vector{Any}, DataFrames.var"#395#396"{Bool, DataFrame}})
    @ Base .\array.jl:683
  [7] manipulate(::DataFrame, ::Any, ::Vararg{Any, N} where N; copycols::Bool, keeprows::Bool, renamecols::Bool)
    @ DataFrames C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\abstractdataframe\selection.jl:1209
  [8] select(::DataFrame, ::Any, ::Vararg{Any, N} where N; copycols::Bool, renamecols::Bool)
    @ DataFrames C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\abstractdataframe\selection.jl:847
  [9] select(::DataFrame, ::Any, ::Any, ::Vararg{Any, N} where N)
    @ DataFrames C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\abstractdataframe\selection.jl:847
 [10] top-level scope
    @ c:\Users\Tests\Downloads\Julia\DataFrames\JuliaCon2021-DataFrames-Tutorial\JuliaCon2021-DataFrames-Tutorial-main\Tutorial.jl:190

I don’t know yet how to correctly read and disentangle such error messages. What should I do to make it working?

Thank you.

The problem is you have not used Project.toml and Manifest.toml from the tutorial. As instructed in the README.md you need to start Julia in the project folder with julia --project command to activate the correct project environment. Then all will work.

I don’t know yet how to correctly read and disentangle such error messages.

Now, how do I know it from the error message. I see the line:

    @ DataFrames C:\Users\Tests\.julia\packages\DataFrames\3mEXm\src\abstractdataframe\selection.jl:172 

which shows me that the version of the package you are using is in 3mEXm subfolder. Now I can check that this subfolder name is associated with 0.22.7 version of DataFrames.jl, while the tutorial is for DataFrames.jl 1.2.0 (so this additionally means that you have installed in your global environment an old version of DataFrames.jl which I recommend you to update to the latest release with the up command in package manager).

You can check by running st DataFrames in your package manager that indeed you have DataFrames.jl 0.22.7 installed instead of the correct version (but as commented - I can tell it without running it just by looking at the folder where the version of DataFrames.jl you use is stored).

1 Like

Indeed. Thank you for all the explanations and your tutorial on such an awesome DataFrame library.