ScikitLearn.jl "Pycall.jlwrap" error

Hallo guys,

Newbie here… I am testing ScikitLearn.jl and want to run a simple logistical regression for fun. Following the model API documentaiton of ScikitLearn.jl, I got the follwing error after running the code:

# All of my data here.
# X = Features = Array{Union{Missing, Float64},2}
# y = Label = Array{Union{Missing, Int64},2}

@sk_import linear_model: LogisticRegression
model = LogisticRegression(fit_intercept=true)

#Fit data
fit!(model, Features, Label)

I got the TypeError below:

PyError ($(Expr(:escape, :(ccall(#= C:\Users\posthumusjj\.juliapro\packages\PyCall\0jMpb\src\pyfncall.jl:44 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'TypeError'>
TypeError("float() argument must be a string or a number, not 'PyCall.jlwrap'",)
  File "C:\Users\posthumusjj\.juliapro\packages\Conda\hsaaN\deps\usr\lib\site-packages\sklearn\linear_model\", line 1285, in fit
    accept_large_sparse=solver != 'liblinear')
  File "C:\Users\posthumusjj\.juliapro\packages\Conda\hsaaN\deps\usr\lib\site-packages\sklearn\utils\", line 756, in check_X_y
  File "C:\Users\posthumusjj\.juliapro\packages\Conda\hsaaN\deps\usr\lib\site-packages\sklearn\utils\", line 527, in check_array
    array = np.asarray(array, dtype=dtype, order=order)
  File "C:\Users\posthumusjj\.juliapro\packages\Conda\hsaaN\deps\usr\lib\site-packages\numpy\core\", line 501, in asarray
    return array(a, dtype, copy=False, order=order)
in top-level scope at base\none
in fit! at ScikitLearn\HK6Vs\src\Skcore.jl:100
in #fit!#26 at ScikitLearn\HK6Vs\src\Skcore.jl:100
in  at PyCall\0jMpb\src\pyfncall.jl:89
in #call#89 at PyCall\0jMpb\src\pyfncall.jl:89
in _pycall! at PyCall\0jMpb\src\pyfncall.jl:11
in _pycall! at PyCall\0jMpb\src\pyfncall.jl:22
in __pycall! at PyCall\0jMpb\src\pyfncall.jl:44
in macro expansion at PyCall\0jMpb\src\exception.jl:84 
in pyerr_check at PyCall\0jMpb\src\exception.jl:64 
in pyerr_check at PyCall\0jMpb\src\exception.jl:60 
PyError ($(Expr(:escape, :(ccall(#= C:\Users\posthumusjj\.juliapro\packages\PyCall\0jMpb\src\pyfncall.jl:44 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'sklearn.exceptions.NotFittedError'>

PyObject LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,

How do I call values instead of the object?

Thanks in advance

ScikitLearn maintainer here. I don’t think I’ve ever tested missing values. I’d have to think about how best to handle them. If you take out the rows with missings, does it work?

1 Like

Oh, so if I understand correct, the missing values does not pass through and then the PyObject causes the TypeError?

Our research group is now closed for the holidays and I don’t have access to the data until January. Will run it then again without missings.


Yeah. ScikitLearn.jl just passes the data to Python via PyCall here, and Python does not have a missing type, so there’s nothing PyCall can do besides an error, I believe. You should filter out the rows with missing, if that’s possible. Or maybe try turning them into NaNs?

For Array{Union{Missing, T}} where T is supported by Numpy, it may make sense to convert it to a But I don’t know if it can be done in a copy-free manner like for Array{T}.

I didn’t even think of removing the missing values since the software I currently use automatically handles it. I’ll look for Julia’s version of Pandas dropna().