Select multiple columns from pandas dataframe

Hi,
I imported pandas and read CSV, I can print the data. I want to select the first 3 columns, it is working in Python as df.iloc[:,:3] but didn’t work in Julia. I also tried df[[“col1”,“col2”,“col3”]].
Any suggestion, I don’t want to add other libraries.

Thanks

According to the usage section here https://github.com/JuliaPy/Pandas.jl
df.iloc should be iloc(df)

Thanks, I am using pandas which is already installed in conda. The one you mentioned is a new library.

Sorry, what package are you using? Just PyCall?

Yes, just PyCall.

Maybe it will work if you index with 1:3 instead of :3 which is not a range in Julia but the symbol 3

1 Like

Can you give a full MWE?

3 Likes
using PyCall

pd = pyimport("pandas")
np = pyimport("numpy")

function f()
	df = pd.read_csv("f.csv")
	print(df.iloc[:,:3])
end

Can you also post the error message as well?

1 Like
MethodError: no method matching getindex(::PyObject, ::Colon, ::Int64)
Closest candidates are:
  getindex(::PyObject, !Matched::Integer, ::Integer) at deprecated.jl:70
  getindex(::PyObject, !Matched::Integer...) at deprecated.jl:70
  getindex(::PyObject, !Matched::T) where T<:Union{AbstractString, Symbol} at C:\Users\naghanem\.julia\packages\PyCall\zqDXB\src\PyCall.jl:326
  ...

Stacktrace:
 [1] fun() at .\In[7]:12
 [2] top-level scope at In[8]:1
 [3] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

I don’t know enough about either pandas or pycall to help you with this, unfortunately. Hopefully someone with more domain knowledge can help. Note that you still have :3 instead of 0:3, which may be causing an issue but I think is unlikely to be the main problem.

1 Like

Thank you. I can print the whole df or one column, but I don’t know how to slice data from multiple columns!

ulia> df = pd.DataFrame(rand(8,5));

julia> get(df[:iloc], (0:py"len"(df)-1,0:2))
PyObject           0         1         2
0  0.957077  0.598744  0.226023
1  0.177671  0.993589  0.037180
2  0.719229  0.271832  0.527947
3  0.538052  0.648124  0.251723
4  0.943631  0.770860  0.725607
5  0.610345  0.917904  0.775536
6  0.527918  0.751472  0.198443
7  0.158496  0.349875  0.057426
2 Likes

Thanks a lot, @jling, works amazingly!.
Can you please just add a short explanation?
Why do I need to use get?
I have different Pythonic libraries I want to use!

basically it’s because various overload in python such that you need to go a bit low-level.

iloc under the hood isn’t a simple function call because it’s the iloc tied to a PyObject so you can’t get the df.iloc first (at that point the pandas magic won’t work). So you need to manually use get( to tell PyCall those two arguments need to go inside df.iloc in python, not in julia.

see also: https://github.com/JuliaPy/PyCall.jl#pyobject

2 Likes

Thank you @jling!