Dataframe population can't figure out why one approach works and the other does not

Hi all
julia noob here. I’m trying to get to grips with changing some functionality in my code based on some excellent guidance from wonderful members of discourse. I have the code “working” but one approach SHOULD work and the other DOES work. I want to know what I am missing. Here is the situation.
thank you for any help.

I have the following dataframe ( from a csv)

Type      Symbol
 STK       XLU       
 STK       XLV     
 STK       XOP       
 STK       XRT       
 CASH      EUR       
 CASH      GBP       
 CASH      AUD       
 CASH      JPY       
 CASH      CHF     
 CASH      CAD       
 IND       VIX      
 IND       RVX       
 IND       VXN       
 IND       VXD      
 FUT       ES      
 FUT       CL  

I want just the STK rows so I filter

symbol_list = symbol_data[symbol_data.Type .== "STK", :]

and get

Type      Symbol
 STK       XLU       
 STK       XLV     
 STK       XOP       
 STK       XRT       

so far so good. NOW comes the tricky bit

I want to build a Dataframe like this

Symbols price ticksize
XLU           0.0       0.0
XLV           0.0       0.0
XOP           0.0       0.0
XRT           0.0       0.0

someone was kind enough to send me this code

df = DataFrame(symbols = symbol_list, price = zeros(length(symbol_list)), tickSize = zeros(length(symbol_list)))

which gives me this error

RROR: MethodError: no method matching length(::DataFrame)
Closest candidates are:
  length(::Union{Base.KeySet, Base.ValueIterator}) at ~/julia-1.7.0/share/julia/base/abstractdict.jl:58
  length(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}) at ~/julia-1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:171
  length(::Union{DataStructures.OrderedRobinDict, DataStructures.RobinDict}) at ~/.julia/packages/DataStructures/vSp4s/src/ordered_robin_dict.jl:86
  ...

as does

df = DataFrame(symbols = symbol_list.Symbol, price = zeros(length(symbol_list)), tickSize = zeros(length(symbol_list)))
ERROR: MethodError: no method matching length(::DataFrame)
Closest candidates are:
  length(::Union{Base.KeySet, Base.ValueIterator}) at ~/julia-1.7.0/share/julia/base/abstractdict.jl:58
  length(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}) at ~/julia-1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:171
  length(::Union{DataStructures.OrderedRobinDict, DataStructures.RobinDict}) at ~/.julia/packages/DataStructures/vSp4s/src/ordered_robin_dict.jl:86
  ...

YET

symbol_list2 =  symbol_list.Symbol
df = DataFrame(symbols = symbol_list2, price = zeros(length(symbol_list2)), tickSize = zeros(length(symbol_list2)))

works fine???

symbols  price  ticksize
XLU           0.0       0.0
XLV           0.0       0.0
XOP           0.0       0.0
 XRT           0.0       0.0

what’s the difference between

typeof(symbol_list2)
Vector{Union{Missing, String7}} (alias for Array{Union{Missing, String7}, 1})

and

typeof(symbol_list.Symbol)
Vector{Union{Missing, String7}} (alias for Array{Union{Missing, String7}, 1})

Isn’t the issue here the difference between symbol_list and symbol_list2?

The errors just tell you the length is not defined for DataFrames (you would want nrow in that case).

symbol_list in your code is a DataFrame, while symbol_list.Symbol is a vector (for which length is defined). You’re not showing how symbol_list2 is defined, but it appears to be a vector as well (maybe you did symbol_list2 = symbol_list.Symbol somehwere?)

3 Likes

hi @nilshg
thanks for the code and I am listening to you two :slight_smile:

If you look at the code snippets you’ll see that I did include how symbol_list2 was formed

symbol_list2 =  symbol_list.Symbol

my fault for multiline, sorry.

so I would expect

df = DataFrame(symbols = symbol_list.Symbol, price = zeros(length(symbol_list)), tickSize = zeros(length(symbol_list)))

BUT as I cut and pasted it I THINK I see the issue. I didn’t change the zeros values to get the length of symbol_list.Symbol but left them at symbol_list. Ho hum. I’ll try later tonight to go back to the drawing board :slight_smile:
sorry to have bothered you all.

Nils gave you the answer above. All you need to do is instead of write length(symbol_list), do nrow(symbol_list).

which is cleaner? remember I am learning here.

df = DataFrame(symbols = symbol_list.Symbol, price = zeros(length(symbol_list.Symbol)), tickSize = zeros(length(symbol_list.Symbol)))

OR

df = DataFrame(symbols = symbol_list.Symbol, price = zeros(nrow(symbol_list)), tickSize = zeros(nrow(symbol_list)))

and why please?

I want to make sure I set a firm foundation.

I would probably say the second is “cleaner”, mainly because if you change the name of the dataframe column from “Symbol” to something else it is less of an issue.

However, if your goal is to form a new DataFrame based on part of another, I would actually not do either. DataFramesMeta makes these operations a lot easier. I would do something more like this:

df_symbols = @rsubset(df, :Type == "STK") # r in rsubset stands for a by row operation
@rselect!(
    df_symbols,
    :Type,
    :price = 0,# change this and the next one to 0.0 if you actually need float values
    :tickSize = 0
)
1 Like

thank you for pointing out DataFramesMeta I wasn’t aware of that package. I’ll look into it and thanks for the guidance on the coding issue.

thanks @nilshg as always. I’m getting there slowly but surely :slight_smile:

Hi there
just in case you thought I was ignoring @nilshg I wouldn’t. I respect his opinion and he’s done nothing but be supportive of my attempts to get my julia adventure off on the the right track. What happened here wa that I realized my mistake and had sent my reply BEFORE I read his solution. Hope this clarifies the situation. I have nothing but respect for @nilshg

1 Like