Newbie : Accessing DataFrame with row and column names


I am complete newbie/convert to Julia also to Datascience. I am following a tutorial on probability. I would like to create so called a probability matrix with DataFrames. Probabiliy matrix in the example problem has fixed number of rows and columns. The rows and colums are all named. There are additional total column and row in the margins of the rows and columns. For example there are 3x2 numeric rows and colums to hold the probability data, 1 additional row and colum for the sum of corresponding rows and colums marked as total, and 1 row and column to hold the names to relevant rows and colunms. It is similar to contingency matrix in machine learning.

I understand, Julia and Dataframes are column oriented, I have read the manual of DataFrames. My wishes are

  1. I would like access the dataframe like access to the specific cell via df[:rowname,:columnname].
  2. How can I contruct such specific dataframe with minimal effort (possiblely with a constructor) and keep names etc all in the same data structure, dataframe…

I have seen no row related naming functionality in API if I am not wrong.

Any help much appreciated.

It sounds to me like what you actually want is something from or rather than a DataFrame.

1 Like

Thank you for the reply. After posting forum, I have read another post similar to this. The advice was to use AxisArrays and NamedArrays. Yes Dataframe is not suitable for this task which mimics database tables.

I preferred NamesArrays since the documentation is much clear and the performance is not deep concern in this little matrix . Following creates exactly what I want.

pt = NamedArray(zeros(3,4), ( [:good, :bad, :ctotal], [:S, :T, :U, :rtotal] ), (“Quality”, “Manufacturer”))

Again thank you for the response.

One more question, I have not able to figure out to contructing a single dimension array/vector with names for each element Array. I have tried various derivation of following with no hope. I appreciate if you could provide some insight. I really do not comprehend the error messages. The convention works for matrix does not work for vectors.

cpg = NamedArray([0.6, 0.25, 0.15], [(:s=>1,:t=>2,:u=>3)])

ERROR: MethodError: no method matching defaultnamesdict(::Pair{Symbol,Int64})
Closest candidates are:
  defaultnamesdict(::Array{T,1} where T) at /home/xxx/.julia/packages/NamedArrays/d4lJL/src/constructors.jl:14
  defaultnamesdict(::Integer) at /home/xxx/.julia/packages/NamedArrays/d4lJL/src/constructors.jl:18
  defaultnamesdict(::Tuple) at /home/xxx/.julia/packages/NamedArrays/d4lJL/src/constructors.jl:19
 [1] map(::typeof(NamedArrays.defaultnamesdict), ::Tuple{Pair{Symbol,Int64},Pair{Symbol,Int64},Pair{Symbol,Int64}}) at ./tuple.jl:141
 [2] defaultnamesdict(::Tuple{Pair{Symbol,Int64},Pair{Symbol,Int64},Pair{Symbol,Int64}}) at /home/toktay/.julia/packages/NamedArrays/d4lJL/src/constructors.jl:19
 [3] map(::typeof(NamedArrays.defaultnamesdict), ::Tuple{Tuple{Pair{Symbol,Int64},Pair{Symbol,Int64},Pair{Symbol,Int64}}}) at ./tuple.jl:139
 [4] defaultnamesdict(::Tuple{Tuple{Pair{Symbol,Int64},Pair{Symbol,Int64},Pair{Symbol,Int64}}}) at /home/toktay/.julia/packages/NamedArrays/d4lJL/src/constructors.jl:19
 [5] NamedArray(::Array{Float64,1}, ::Array{Tuple{Pair{Symbol,Int64},Pair{Symbol,Int64},Pair{Symbol,Int64}},1}, ::Array{Symbol,1}) at /home/toktay/.julia/packages/NamedArrays/d4lJL/src/constructors.jl:63
 [6] NamedArray(::Array{Float64,1}, ::Array{Tuple{Pair{Symbol,Int64},Pair{Symbol,Int64},Pair{Symbol,Int64}},1}) at /home/toktay/.julia/packages/NamedArrays/d4lJL/src/constructors.jl:57
 [7] top-level scope at none:0

I’m not too familiar with the package, but if by 1,2,3 there you meant rows 1,2,3 then you can try

cpg = NamedArray([0.6, 0.25, 0.15], ([:s,:t,:u],))

Notice that the second argument takes the form of a two element tuple but the second position of the tuple is empty. Don’t ask me why, that’s just how it works.

1 Like

Thank you very much.