I am struggling to convert a Matrix where I have the first row as header and possible NA values in a DataFrame.
Actually I did find at least two ways, and the resulting df seems ok, but in reality the individual columns are either Vector or SubArray, instead of DataArray, and this then cause me problems when I try to operate on them (e.g. to remove the NA with dropna() )
So, this is my code:
using DataFrames, DataFramesMeta
# Original data in Matrix format..
m = ["A" "B" "C"
1 10 100;
nothing nothing nothing;
3 30 300;
4 40 400]
# Converting to Dataframe..
h = [Symbol(c) for c in m[1,:]]
vals = m[2:end, :]
vals2 = [vals[:,c] for c in 1:size(vals)[2]]
df1 = DataFrame(vals2, h)
# alternative that produce subarrays: df = DataFrame(Any[@view m[2:end, i] for i::Int64 in 1:size(m, 2)], Symbol.(m[1, :]))
# Cleaning to get NA values..
for row in eachrow(df1)
for name in names(df1)
if row[name] == nothing
row[name] = NA
end
end
end
Now, the problem is the type of dataframe columns, as they are Vectors:
typeof(df1[:A])
Array{Any,1}
And some functions don’t work with them:
dropna(df1[:A])
4-element Array{Any,1}:
1
NA
3
4
For comparison, when I create a df by scratch:
df2 = DataFrame(
A = [1, 2, 3, 4],
B = [10, 20, 30, 40]
)
df2[2,:A] = NA
df2[2,:B] = NA
The type is a DataArray and dropna() works without problems:
typeof(df2[:A])
DataArrays.DataArray{Int64,1}
dropna(df2[:A])
3-element Array{Int64,1}:
1
3
4