Converting a Matrix to a DataFrame

question

#1

I am struggling to convert a Matrix where I have the first row as header and possible NA values in a DataFrame.

Actually I did find at least two ways, and the resulting df seems ok, but in reality the individual columns are either Vector or SubArray, instead of DataArray, and this then cause me problems when I try to operate on them (e.g. to remove the NA with dropna() )

So, this is my code:

using DataFrames, DataFramesMeta

# Original data in Matrix format..
m = ["A"       "B"        "C"
     1         10         100;
     nothing   nothing    nothing;
     3         30         300;
     4         40         400]
# Converting to Dataframe..
h = [Symbol(c) for c in m[1,:]]
vals = m[2:end, :]
vals2 = [vals[:,c] for c in 1:size(vals)[2]]
df1 = DataFrame(vals2, h)
# alternative that produce subarrays:  df =  DataFrame(Any[@view m[2:end, i] for i::Int64 in 1:size(m, 2)], Symbol.(m[1, :]))
# Cleaning to get NA values..
for row in eachrow(df1)
  for name in names(df1)
    if row[name] == nothing
        row[name] = NA
    end
  end
end

Now, the problem is the type of dataframe columns, as they are Vectors:

typeof(df1[:A])
Array{Any,1}

And some functions don’t work with them:

dropna(df1[:A])
4-element Array{Any,1}:
 1  
  NA
 3  
 4

For comparison, when I create a df by scratch:

df2 = DataFrame(
    A = [1, 2, 3, 4],
    B = [10, 20, 30, 40]
)
df2[2,:A] = NA 
df2[2,:B] = NA

The type is a DataArray and dropna() works without problems:

typeof(df2[:A])
DataArrays.DataArray{Int64,1}
dropna(df2[:A])
3-element Array{Int64,1}:
 1
 3
 4

#2

I solved with:

matnms2df(mat, headerstrs) = convert(DataFrame,
     Dict(zip(headerstrs,[mat[:,i] for i in 1:size(mat,2)])))
df4 = matnms2df(vals,h)
for row in eachrow(df4)
  for name in names(df4)
    if row[name] == nothing
        row[name] = NA
    end
  end
end
typeof(df4[:A])
DataArrays.DataArray{Any,1}

(from https://stackoverflow.com/a/27284301/1586860)

Still I don’t understand why if you construct the DataFrame in a different way than using a Dict the type of the columns change.