I am struggling to convert a Matrix where I have the first row as header and possible NA values in a DataFrame.
Actually I did find at least two ways, and the resulting df seems ok, but in reality the individual columns are either Vector
or SubArray
, instead of DataArray
, and this then cause me problems when I try to operate on them (e.g. to remove the NA with dropna() )
So, this is my code:
using DataFrames, DataFramesMeta
# Original data in Matrix format..
m = ["A" "B" "C"
1 10 100;
nothing nothing nothing;
3 30 300;
4 40 400]
# Converting to Dataframe..
h = [Symbol(c) for c in m[1,:]]
vals = m[2:end, :]
vals2 = [vals[:,c] for c in 1:size(vals)[2]]
df1 = DataFrame(vals2, h)
# alternative that produce subarrays: df = DataFrame(Any[@view m[2:end, i] for i::Int64 in 1:size(m, 2)], Symbol.(m[1, :]))
# Cleaning to get NA values..
for row in eachrow(df1)
for name in names(df1)
if row[name] == nothing
row[name] = NA
end
end
end
Now, the problem is the type of dataframe columns, as they are Vectors:
typeof(df1[:A])
Array{Any,1}
And some functions don’t work with them:
dropna(df1[:A])
4-element Array{Any,1}:
1
NA
3
4
For comparison, when I create a df by scratch:
df2 = DataFrame(
A = [1, 2, 3, 4],
B = [10, 20, 30, 40]
)
df2[2,:A] = NA
df2[2,:B] = NA
The type is a DataArray and dropna() works without problems:
typeof(df2[:A])
DataArrays.DataArray{Int64,1}
dropna(df2[:A])
3-element Array{Int64,1}:
1
3
4