How to install feather using PyCall?

pycall

#1

I am trying to create some benchmark for writing tabular data to disk. So I wnated to install feather in Julia. But the below gives errors

using PyCall
using Conda, DataFrames, FileIO
#Conda.add("pandas") # need to run if runs into error
@pyimport pandas as pd
df = DataFrame(x = [1]);

FileIO.save("df_fileio.csv", df)
frm = pd.read_csv("df_fileio.csv")  # will be used for testting Pandas
frm[:to_feather]("p.feather")

with error

ERROR: PyError (ccall(@pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, arg, C_NULL)) <type 'exceptions.ImportError'>
ImportError('the feather-format library is not installed\nyou can install via conda\nconda install feather-format -c conda-forge\nor via pip\npip install -U feather-format\n',)
  File "C:\Users\dzj\.julia\v0.6\Conda\deps\usr\lib\site-packages\pandas\core\frame.py", line 1625, in to_feather
    to_feather(self, fname)
  File "C:\Users\dzj\.julia\v0.6\Conda\deps\usr\lib\site-packages\pandas\io\feather_format.py", line 51, in to_feather
    feather = _try_import()
  File "C:\Users\dzj\.julia\v0.6\Conda\deps\usr\lib\site-packages\pandas\io\feather_format.py", line 18, in _try_import
    raise ImportError("the feather-format library is not installed\n"

Stacktrace:
 [1] pyerr_check at C:\Users\dzj\.julia\v0.6\PyCall\src\exception.jl:56 [inlined]
 [2] pyerr_check at C:\Users\dzj\.julia\v0.6\PyCall\src\exception.jl:61 [inlined]
 [3] macro expansion at C:\Users\dzj\.julia\v0.6\PyCall\src\exception.jl:81 [inlined]
 [4] #_pycall#67(::Array{Any,1}, ::Function, ::PyCall.PyObject, ::String, ::Vararg{String,N} where N) at C:\Users\dzj\.julia\v0.6\PyCall\src\PyCall.jl:653
 [5] _pycall(::PyCall.PyObject, ::String, ::Vararg{String,N} where N) at C:\Users\dzj\.julia\v0.6\PyCall\src\PyCall.jl:641
 [6] #pycall#71(::Array{Any,1}, ::Function, ::PyCall.PyObject, ::Type{PyCall.PyAny}, ::String, ::Vararg{String,N} where N) at C:\Users\dzj\.julia\v0.6\PyCall\src\PyCall.jl:675
 [7] pycall(::PyCall.PyObject, ::Type{PyCall.PyAny}, ::String, ::Vararg{String,N} where N) at C:\Users\dzj\.julia\v0.6\PyCall\src\PyCall.jl:675
 [8] #call#72(::Array{Any,1}, ::PyCall.PyObject, ::String, ::Vararg{String,N} where N) at C:\Users\dzj\.julia\v0.6\PyCall\src\PyCall.jl:678
 [9] (::PyCall.PyObject)(::String, ::Vararg{String,N} where N) at C:\Users\dzj\.julia\v0.6\PyCall\src\PyCall.jl:678
 [10] eval(::Module, ::Any) at .\boot.jl:235

This is how I tried to install the feather format.

using Conda
Conda.add("feather") # error 
Conda.add("feather-format") # error 
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - feather

Current channels:

  - https://repo.continuum.io/pkgs/main/win-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/win-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/win-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/win-64
  - https://repo.continuum.io/pkgs/pro/noarch
  - https://repo.continuum.io/pkgs/msys2/win-64
  - https://repo.continuum.io/pkgs/msys2/noarch


ERROR: failed process: Process(setenv(`'C:\Users\dzj\.julia\v0.6\Conda\deps\usr\Scripts\conda.exe' install -y feather`,String["USERDOMAIN_ROAMINGPROFILE=DESKTOP-SC091K3", "HOMEPATH=\\Users\\dzj", "VSCODE_NLS_CONFIG={\"locale\":\"en-us\",\"availableLanguages\":{}}", "ProgramData=C:\\ProgramData", "LD_LIBRARY_PATH=D:\\c_lib", "ProgramW6432=C:\\Program Files", "PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC", "SESSIONNAME=Console", "APPDATA=C:\\Users\\dzj\\AppData\\Roaming", "PUBLIC=C:\\Users\\Public",
"USERDOMAIN=DESKTOP-SC091K3", "OS=Windows_NT", "PROCESSOR_REVISION=9e09", "TMP=C:\\Users\\dzj\\AppData\\Local\\Temp", "ALLUSERSPROFILE=C:\\ProgramData", "GOPATH=C:\\Users\\dzj\\Documents\\go", "Path=C:\\Users\\dzj\\.julia\\v0.6\\Conda\\deps\\usr\\Library\\bin;C:\\Users\\dzj\\AppData\\Local\\Julia-0.6.2\\bin;C:\\Program Files\\Microsoft MPI\\Bin\\;C:\\ProgramData\\Oracle\\Java\\javapath;C:\\Program Files\\Docker\\Docker\\Resources\\bin;C:\\Program Files (x86)\\Intel\\iCLS Client\\;C:\\Program Files\\Intel\\iCLS Client\\;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL;C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\IPT;C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\IPT;C:\\Program Files\\Intel\\WiFi\\bin\\;C:\\Program Files\\Common Files\\Intel\\WirelessCommon\\;C:\\Program Files\\Git\\cmd;C:\\Program Files\\PuTTY\\;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\MSBuild\\15.0\\Bin;C:\\Program Files\\nodejs\\;C:\\Go\\bin;C:\\Users\\dzj\\Documents\\go\\bin;C:\\Program Files\\cURL\\bin;d:\\c_lib\\;C:\\Users\\dzj\\Anaconda3\\Scripts;C:\\Program Files\\dotnet\\;C:\\Users\\dzj\\.cargo\\bin;C:\\Users\\dzj\\AppData\\Roaming\\Dashlane\\4.8.8.36676\\bin\\Firefox_Extension\\{442718d9-475e-452a-b3e1-fb1ee16b8e9f}\\components;C:\\Users\\dzj\\AppData\\Roaming\\Dashlane\\4.8.8.36676\\ucrt;C:\\Users\\dzj\\AppData\\Local\\atom\\bin;C:\\Users\\dzj\\AppData\\Roaming\\Dashlane\\4.8.9.37516\\bin\\Firefox_Extension\\{442718d9-475e-452a-b3e1-fb1ee16b8e9f}\\components;C:\\Users\\dzj\\AppData\\Roaming\\Dashlane\\4.8.9.37516\\ucrt;C:\\Users\\dzj\\AppData\\Local\\GitHubDesktop\\bin;C:\\Program Files\\Microsoft VS Code\\bin;C:\\Users\\dzj\\AppData\\Local\\Julia-0.6.1\\bin\\;C:\\Users\\dzj\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Program Files\\mingw-w64\\x86_64-7.2.0-posix-seh-rt_v5-rev1\\mingw64\\bin;d:\\c_lib\\;C:\\Users\\dzj\\Anaconda3\\Scripts;C:\\Program Files\\Microsoft VS Code Insiders\\bin;C:\\Users\\dzj\\AppData\\Roaming\\Dashlane\\5.6.0.15520\\bin\\Firefox_Extension\\{442718d9-475e-452a-b3e1-fb1ee16b8e9f}\\components;C:\\Users\\dzj\\AppData\\Roaming\\Dashlane\\5.6.0.15520\\ucrt", "COMPUTERNAME=DESKTOP-SC091K3", "MSMPI_BIN=C:\\Program Files\\Microsoft MPI\\Bin\\", "USERNAME=dzj", "CommonProgramFiles(x86)=C:\\Program Files (x86)\\Common Files",
"CommonProgramFiles=C:\\Program Files\\Common Files", "CONDARC=C:\\Users\\dzj\\.julia\\v0.6\\Conda\\deps\\usr\\condarc-julia.yml", "USERPROFILE=C:\\Users\\dzj", "PSModulePath=C:\\Program Files\\WindowsPowerShell\\Modules;C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\Modules", "PROCESSOR_LEVEL=6", "TEMP=C:\\Users\\dzj\\AppData\\Local\\Temp", "SystemDrive=C:", "HOMEDRIVE=C:", "LOCALAPPDATA=C:\\Users\\dzj\\AppData\\Local", "PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 158 Stepping 9, GenuineIntel", "NUMBER_OF_PROCESSORS=8", "asl.log=Destination=file", "VSCODE_PID=16852", "VSCODE_IPC_HOOK=\\\\.\\pipe\\7d0542cfc95d7d5d0a5cb915f372339a-1.20.0-main-sock", "ComSpec=C:\\WINDOWS\\system32\\cmd.exe", "LANG=en_US.UTF-8", "VS140COMNTOOLS=C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\Common7\\Tools\\", "DASHLANE_DLL_DIR=C:\\Users\\dzj\\AppData\\Roaming\\Dashlane\\5.6.0.15520\\bin\\Firefox_Extension\\{442718d9-475e-452a-b3e1-fb1ee16b8e9f}\\components;C:\\Users\\dzj\\AppData\\Roaming\\Dashlane\\5.6.0.15520\\ucrt", "SystemRoot=C:\\WINDOWS", "OneDrive=C:\\Users\\dzj\\OneDrive", "CONDA_DEFAULT_ENV=C:\\Users\\dzj\\.julia\\v0.6\\Conda\\deps\\usr", "VBOX_MSI_INSTALL_PATH=C:\\Program Files\\Oracle\\VirtualBox\\", "TERM_PROGRAM_VERSION=1.20.0", "ProgramFiles(x86)=C:\\Program Files (x86)", "JULIA_NUM_THREADS=4", "VSCODE_CWD=C:\\Program Files\\Microsoft VS Code", "TERM_PROGRAM=vscode", "LOGONSERVER=\\\\DESKTOP-SC091K3", "CONDA_PREFIX=C:\\Users\\dzj\\.julia\\v0.6\\Conda\\deps\\usr", "windir=C:\\WINDOWS", "FPS_BROWSER_USER_PROFILE_STRING=Default", "CommonProgramW6432=C:\\Program Files\\Common Files", "ProgramFiles=C:\\Program Files", "FPS_BROWSER_APP_PROFILE_STRING=Internet Explorer", "VSCODE_NODE_CACHED_DATA_DIR_16852=C:\\Users\\dzj\\AppData\\Roaming\\Code\\CachedData\\c63189deaa8e620f650cc28792b8f5f3363f2c5b", "PROCESSOR_ARCHITECTURE=AMD64", "OPENBLAS_MAIN_FREE=1", "GOROOT=C:\\Go\\"]), ProcessExited(1)) [1]
Stacktrace:
 [1] pipeline_error(::Base.Process) at .\process.jl:682
 [2] run(::Cmd) at .\process.jl:651
 [3] add(::String, ::String) at C:\Users\dzj\.julia\v0.6\Conda\src\Conda.jl:203 (repeats 2 times)
 [4] eval(::Module, ::Any) at .\boot.jl:235

#2

I haven’t used PyCall directly much, so I can’t help there.

But there is a Feather.jl package available already, if that helps at all. Might be easier than trying to set it up via PyCall yourself.


#3

Part of the point is to show speed difference between pycall and feather.jl


#4

Just so you are aware, I’m currently in the process of re-writing Feather with a new Arrow.jl back-end (the arrow.jl package will handle all arrow formatted data) and am very nearly done. I haven’t done any extensive benchmarking yet.


#5

This is a conda-forge package, so you need to use the conda-forge channel: Conda.add("feather-format", "conda-forge").


#6

Now I get the below erro and I am trying to read Conda.jl’s readme, but I find it a bit hard to understand, probably because I have no background in Python.

Conda.add_channel("conda-forge")
Conda.add("feather-format","conda-forge")
ERROR: ArgumentError: Path to conda environment is not valid.
Stacktrace:
 [1] prefix at C:\Users\dzj\.julia\v0.6\Conda\src\Conda.jl:59 [inlined]
 [2] _install_conda(::String, ::Bool) at C:\Users\dzj\.julia\v0.6\Conda\src\Conda.jl:195
 [3] _install_conda(::String) at C:\Users\dzj\.julia\v0.6\Conda\src\Conda.jl:163
 [4] runconda at C:\Users\dzj\.julia\v0.6\Conda\src\Conda.jl:129 [inlined]
 [5] add(::String, ::String) at C:\Users\dzj\.julia\v0.6\Conda\src\Conda.jl:203
 [6] eval(::Module, ::Any) at .\boot.jl:235

I checked my Conda.ROOTENV and it points to a valid folder.


#7

Really looking forward to it as you can see, Feather.jl is about 2x slower than R’s feather implementation, and testing of feather for Python for me is pending resolution of this thread for me.


#8

Hm, I’m not sure I understand why the old feather would have been quite that slow. This is probably a stupid question, but are you absolutely sure you didn’t include any compile time? There certainly is a possibility that (for strings at least) serializing and deserializing from Feather is just fundamentally more expensive than in R because of Julia’s in-memory format. Historically I think that we have had problems with serialization, it’s always been slow compared to other languages for just about everything, I’m not sure why.

Anyway, to help you resolve this issue, I suggest that you do the following

  1. pip3 install --upgrade feather-format
  2. add @pyimport feather to your script
  3. Instead of using pandas to_feather, use feather.write_dataframe and feather.read_dataframe.

#9

I used @benchmark and took the mean. As a side note, I think it’s fair to include compile time, as that is truly reflective of how an end-user experience the language. But I’ve tested this on 100 million rows of data and the conclusion is the same.

That is an issue for a language that is meant to be “fast”. I wonder if it can be solved and soon?


#10

This I do not agree with. Suppose you have a problem of “size” n. You have two options (roughly speaking), you can interpret your code and get time complexity cf(n) where c\sim10 and f(n) is some monotonically increasing function (usually at least linear, but it is frequently polynomial), or you could compile and get f(n)+\epsilon where \epsilon\sim100~\mbox{ms}. For the vast majority of applications it should be obvious that you want to compile. Not only does compiled code scale in a more reasonable way, but even if you re-run the same code more than once you no longer have the \epsilon. When I see people complain that something is “sluggish” because they had to wait for something to compile, it makes me want to scream (not saying that was the case in this thread, just a general comment). The end-user “experience” is efficient compiled code, not a completely inappropriate scaling behavior which is an artifact of a language that is excellent for it’s original purpose (scripting) in which compile time sometimes matters, but totally inappropriate for everything else. Also, my understanding is that the Julia devs have not been working on compilation efficiency much at all, because it’s something that can always be done later without breaking changes, so things will only approve. I think this is an excellent approach. (Sorry for my unnecessarily elaborate rant on this issue, but I know we are going to have problems with people unfavorably comparing Julia with R and Python because of compile times, as I’ve already heard a fair bit of this, and this has to be one of the ultimate face-palms in programming).

Anyway, it’s all a moot point here because as you’ve said, your data had 10^8 rows :laughing:.

This I do agree with. I don’t know what the core issue is if there even is one. Perhaps I’m just misinterpreting things. It’s been a while since I’ve done any real benchmarking on this myself.


#11

My mistake, it should be just

Conda.add_channel("conda-forge")
Conda.add("feather-format")

#12

This is actually annoying. Thanks for help, but I think it’s saying that PyCall uses Python 2.7 but feather-format expects python=3.6?

Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - feather-format -> python=3.6
  - python=2.7
Use "conda info <package>" to see the dependencies for each package.

#13

You can reconfigure PyCall to use Python 3


#14

By the way, has any thought been given to having PyCall use Python3 by default? Today I never find any reason to use 2.


#15

On Mac and Windows, this is mostly about changing the Conda.jl default: https://github.com/JuliaPy/Conda.jl/pull/108

On Linux, PyCall defaults to whatever the python points to, i.e. it follows your distro.


#16

Yeah, it was stupid of me to even ask that as it’s pretty obvious.

Are there any distros where python is Python 3.6? Fedora maybe? It’ll probably take another 100 years or so before this become standard.

Let’s hope that Julia 1.0 \to 2.0 doesn’t share the same fate as Python 3: seemingly endless purgatory.


#17

Arch Linux does this. According to PEP 394, however, the recommendation is that /usr/bin/python should point to python2 for the forseeable future.

Arguably, then, once we switch over Conda.jl to default to Python3, then for consistency PyCall should default to python3 if it exists.