Dataframe population can't figure out why one approach works and the other does not

anon69491625 · March 1, 2022, 6:43pm

Hi all
julia noob here. I’m trying to get to grips with changing some functionality in my code based on some excellent guidance from wonderful members of discourse. I have the code “working” but one approach SHOULD work and the other DOES work. I want to know what I am missing. Here is the situation.
thank you for any help.

I have the following dataframe ( from a csv)

Type      Symbol
 STK       XLU       
 STK       XLV     
 STK       XOP       
 STK       XRT       
 CASH      EUR       
 CASH      GBP       
 CASH      AUD       
 CASH      JPY       
 CASH      CHF     
 CASH      CAD       
 IND       VIX      
 IND       RVX       
 IND       VXN       
 IND       VXD      
 FUT       ES      
 FUT       CL

I want just the STK rows so I filter

symbol_list = symbol_data[symbol_data.Type .== "STK", :]

and get

Type      Symbol
 STK       XLU       
 STK       XLV     
 STK       XOP       
 STK       XRT

so far so good. NOW comes the tricky bit

I want to build a Dataframe like this

Symbols price ticksize
XLU           0.0       0.0
XLV           0.0       0.0
XOP           0.0       0.0
XRT           0.0       0.0

someone was kind enough to send me this code

df = DataFrame(symbols = symbol_list, price = zeros(length(symbol_list)), tickSize = zeros(length(symbol_list)))

which gives me this error

RROR: MethodError: no method matching length(::DataFrame)
Closest candidates are:
  length(::Union{Base.KeySet, Base.ValueIterator}) at ~/julia-1.7.0/share/julia/base/abstractdict.jl:58
  length(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}) at ~/julia-1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:171
  length(::Union{DataStructures.OrderedRobinDict, DataStructures.RobinDict}) at ~/.julia/packages/DataStructures/vSp4s/src/ordered_robin_dict.jl:86
  ...

as does

df = DataFrame(symbols = symbol_list.Symbol, price = zeros(length(symbol_list)), tickSize = zeros(length(symbol_list)))
ERROR: MethodError: no method matching length(::DataFrame)
Closest candidates are:
  length(::Union{Base.KeySet, Base.ValueIterator}) at ~/julia-1.7.0/share/julia/base/abstractdict.jl:58
  length(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}) at ~/julia-1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:171
  length(::Union{DataStructures.OrderedRobinDict, DataStructures.RobinDict}) at ~/.julia/packages/DataStructures/vSp4s/src/ordered_robin_dict.jl:86
  ...

YET

symbol_list2 =  symbol_list.Symbol
df = DataFrame(symbols = symbol_list2, price = zeros(length(symbol_list2)), tickSize = zeros(length(symbol_list2)))

works fine???

symbols  price  ticksize
XLU           0.0       0.0
XLV           0.0       0.0
XOP           0.0       0.0
 XRT           0.0       0.0

what’s the difference between

typeof(symbol_list2)
Vector{Union{Missing, String7}} (alias for Array{Union{Missing, String7}, 1})

and

typeof(symbol_list.Symbol)
Vector{Union{Missing, String7}} (alias for Array{Union{Missing, String7}, 1})

nilshg · March 1, 2022, 7:23pm

Isn’t the issue here the difference between symbol_list and symbol_list2?

The errors just tell you the length is not defined for DataFrames (you would want nrow in that case).

symbol_list in your code is a DataFrame, while symbol_list.Symbol is a vector (for which length is defined). You’re not showing how symbol_list2 is defined, but it appears to be a vector as well (maybe you did symbol_list2 = symbol_list.Symbol somehwere?)

anon69491625 · March 1, 2022, 7:38pm

hi @nilshg
thanks for the code and I am listening to you two

If you look at the code snippets you’ll see that I did include how symbol_list2 was formed

symbol_list2 =  symbol_list.Symbol

my fault for multiline, sorry.

so I would expect

df = DataFrame(symbols = symbol_list.Symbol, price = zeros(length(symbol_list)), tickSize = zeros(length(symbol_list)))

BUT as I cut and pasted it I THINK I see the issue. I didn’t change the zeros values to get the length of symbol_list.Symbol but left them at symbol_list. Ho hum. I’ll try later tonight to go back to the drawing board
sorry to have bothered you all.

pdeffebach · March 1, 2022, 7:39pm

Nils gave you the answer above. All you need to do is instead of write length(symbol_list), do nrow(symbol_list).

anon69491625 · March 1, 2022, 7:43pm

which is cleaner? remember I am learning here.

df = DataFrame(symbols = symbol_list.Symbol, price = zeros(length(symbol_list.Symbol)), tickSize = zeros(length(symbol_list.Symbol)))

OR

df = DataFrame(symbols = symbol_list.Symbol, price = zeros(nrow(symbol_list)), tickSize = zeros(nrow(symbol_list)))

and why please?

I want to make sure I set a firm foundation.

junder873 · March 1, 2022, 8:40pm

I would probably say the second is “cleaner”, mainly because if you change the name of the dataframe column from “Symbol” to something else it is less of an issue.

However, if your goal is to form a new DataFrame based on part of another, I would actually not do either. DataFramesMeta makes these operations a lot easier. I would do something more like this:

df_symbols = @rsubset(df, :Type == "STK") # r in rsubset stands for a by row operation
@rselect!(
    df_symbols,
    :Type,
    :price = 0,# change this and the next one to 0.0 if you actually need float values
    :tickSize = 0
)

anon69491625 · March 1, 2022, 9:04pm

thank you for pointing out DataFramesMeta I wasn’t aware of that package. I’ll look into it and thanks for the guidance on the coding issue.

anon69491625 · March 1, 2022, 9:58pm

thanks @nilshg as always. I’m getting there slowly but surely

anon69491625 · March 1, 2022, 11:32pm

Hi there
just in case you thought I was ignoring @nilshg I wouldn’t. I respect his opinion and he’s done nothing but be supportive of my attempts to get my julia adventure off on the the right track. What happened here wa that I realized my mistake and had sent my reply BEFORE I read his solution. Hope this clarifies the situation. I have nothing but respect for @nilshg

Topic		Replies	Views
Sequentially add data to a DataFrame New to Julia question , dataframes	4	779	January 9, 2025
How to initialize empty dataframe of specified size New to Julia dataframes	4	2742	August 31, 2021
Populate a Dataframe from an array New to Julia question	4	935	March 4, 2020
DimensionMismatch("column length of 178 for columns startTime, startLoc, routePlaned, and is incompatible with column length 177 for columns dists and travelTimes" in julia dataframes) General Usage	5	411	December 3, 2019
Creating an empty dataframe from a vector of strings fails? New to Julia dataframes	8	843	April 25, 2022

Dataframe population can't figure out why one approach works and the other does not

Related topics