ScikitLearn.jl "Pycall.jlwrap" error

Hallo guys,

Newbie here… I am testing ScikitLearn.jl and want to run a simple logistical regression for fun. Following the model API documentaiton of ScikitLearn.jl, I got the follwing error after running the code:

# All of my data here.
# X = Features = Array{Union{Missing, Float64},2}
# y = Label = Array{Union{Missing, Int64},2}

#ScikitLearn
@sk_import linear_model: LogisticRegression
model = LogisticRegression(fit_intercept=true)

#Fit data
fit!(model, Features, Label)

I got the TypeError below:

PyError ($(Expr(:escape, :(ccall(#= C:\Users\posthumusjj\.juliapro\packages\PyCall\0jMpb\src\pyfncall.jl:44 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'TypeError'>
TypeError("float() argument must be a string or a number, not 'PyCall.jlwrap'",)
  File "C:\Users\posthumusjj\.juliapro\packages\Conda\hsaaN\deps\usr\lib\site-packages\sklearn\linear_model\logistic.py", line 1285, in fit
    accept_large_sparse=solver != 'liblinear')
  File "C:\Users\posthumusjj\.juliapro\packages\Conda\hsaaN\deps\usr\lib\site-packages\sklearn\utils\validation.py", line 756, in check_X_y
    estimator=estimator)
  File "C:\Users\posthumusjj\.juliapro\packages\Conda\hsaaN\deps\usr\lib\site-packages\sklearn\utils\validation.py", line 527, in check_array
    array = np.asarray(array, dtype=dtype, order=order)
  File "C:\Users\posthumusjj\.juliapro\packages\Conda\hsaaN\deps\usr\lib\site-packages\numpy\core\numeric.py", line 501, in asarray
    return array(a, dtype, copy=False, order=order)
in top-level scope at base\none
in fit! at ScikitLearn\HK6Vs\src\Skcore.jl:100
in #fit!#26 at ScikitLearn\HK6Vs\src\Skcore.jl:100
in  at PyCall\0jMpb\src\pyfncall.jl:89
in #call#89 at PyCall\0jMpb\src\pyfncall.jl:89
in _pycall! at PyCall\0jMpb\src\pyfncall.jl:11
in _pycall! at PyCall\0jMpb\src\pyfncall.jl:22
in __pycall! at PyCall\0jMpb\src\pyfncall.jl:44
in macro expansion at PyCall\0jMpb\src\exception.jl:84 
in pyerr_check at PyCall\0jMpb\src\exception.jl:64 
in pyerr_check at PyCall\0jMpb\src\exception.jl:60 
PyError ($(Expr(:escape, :(ccall(#= C:\Users\posthumusjj\.juliapro\packages\PyCall\0jMpb\src\pyfncall.jl:44 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'sklearn.exceptions.NotFittedError'>

PyObject LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
REPL

How do I call values instead of the object?

Thanks in advance

ScikitLearn maintainer here. I don’t think I’ve ever tested missing values. I’d have to think about how best to handle them. If you take out the rows with missings, does it work?

1 Like

Oh, so if I understand correct, the missing values does not pass through and then the PyObject causes the TypeError?

Our research group is now closed for the holidays and I don’t have access to the data until January. Will run it then again without missings.

Thanx!

Yeah. ScikitLearn.jl just passes the data to Python via PyCall here, and Python does not have a missing type, so there’s nothing PyCall can do besides an error, I believe. You should filter out the rows with missing, if that’s possible. Or maybe try turning them into NaNs?

For Array{Union{Missing, T}} where T is supported by Numpy, it may make sense to convert it to a numpy.ma.MaskedArray. But I don’t know if it can be done in a copy-free manner like for Array{T}.

I didn’t even think of removing the missing values since the software I currently use automatically handles it. I’ll look for Julia’s version of Pandas dropna().