Problem using DataFrames master

question

#1

Taking a cue from the following posts, I upgraded to DataFrames latest master branch hoping to see some speed improvements in reading the csv data but got an error. The excerpts of the code where the issue occurs is given below which works fine with the current stable release. Not a showstopper for me as by reveritng back to the stable version, I am able to get the desired result.


using MLDataUtils  #MLDataPattern,
@views X,y=undersample((X_train,y_train),shuffle=false);
# X_resamp,y_resamp=oversample((X_train,y_train),fraction=0.2);

@views X =  convert(Array{Float64}, X);
@views y = convert(Array{Int64}, y);
@views X_tst=  convert(Array{Float64}, X_test);
@views y_tst =  convert(Array{Int64}, y_test);

@views X_train0 =  convert(Array{Float64}, X_train);
@views y_train0 = convert(Array{Int64}, y_train);

println(size(X),size(y))

print(size(X_test),size(y_test))

┌ Info: Recompiling stale cache file C:\Users\chatura\.julia\compiled\v1.0\MLDataUtils\CQWB9.ji for MLDataUtils [cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d]
└ @ Base loading.jl:1190
MethodError: getindex(::DataFrames.DataFrameColumns{SubDataFrame{Array{Int64,1}},AbstractArray{T,1} where T}, ::Int64) is ambiguous. Candidates:
  getindex(itr::DataFrames.DataFrameColumns{#s383,AbstractArray{T,1} where T} where #s383<:SubDataFrame, j) in DataFrames at C:\Users\chatura\.julia\packages\DataFrames\VGojy\src\deprecated.jl:1428
  getindex(itr::DataFrames.DataFrameColumns{#s46,AbstractArray{T,1} where T} where #s46<:AbstractDataFrame, j::Int64) in DataFrames at C:\Users\chatura\.julia\packages\DataFrames\VGojy\src\abstractdataframe\iteration.jl:135
Possible fix, define
  getindex(::DataFrames.DataFrameColumns{#s383,AbstractArray{T,1} where T} where #s383<:SubDataFrame, ::Int64)

Stacktrace:
 [1] iterate(::DataFrames.DataFrameColumns{SubDataFrame{Array{Int64,1}},AbstractArray{T,1} where T}, ::Tuple{Base.OneTo{Int64}}) at .\abstractarray.jl:838
 [2] iterate at .\abstractarray.jl:836 [inlined]
 [3] zip_iterate at .\iterators.jl:304 [inlined]
 [4] iterate at .\iterators.jl:320 [inlined]
 [5] convert at C:\Users\chatura\.julia\packages\DataFrames\VGojy\src\abstractdataframe\abstractdataframe.jl:802 [inlined]
 [6] convert(::Type{Array{Float64,N} where N}, ::SubDataFrame{Array{Int64,1}}) at C:\Users\chatura\.julia\packages\DataFrames\VGojy\src\abstractdataframe\abstractdataframe.jl:796
 [7] top-level scope at In[7]:3


#2

I think you should file an issue. This looks like a missing method after the recent getindex restructuring. Can you put together a copy-and-pasteable MWE?


#3

Sure I would do that.


#4

I think that’s fixed by https://github.com/JuliaData/DataFrames.jl/pull/1601/files#r236036652.


#5

I don’t find so as I updated the DataFrames master package today and face the same issue. Let me know if should file an issue.


#6

That PR hasn’t been merged yet. You can check out the branch if you want and test it out. However it should be merged soon. Sorry for this! Welcome to the life of being on master