Issue with DataFrames, operations on DataFrames now return Nullable Arrays?


#1

I would like to change entries in a DataFrame. Before, I was using:

df2[df2[:country] .== "United States", :country] = "USA"

which basically looks for entries equal to “United States” in column country and fills those entries if “USA” instead. However, strangely something about the type of df2[:country] .== "United States" changed, which is now NullableArrays.NullableArray{Bool,1}. I am pretty sure that the last time I used my code it was a DataArray, but I am not 100% sure. Running the line above gives me

MethodError: no method matching setindex!(::DataFrames.DataFrame, ::String, ::NullableArrays.NullableArray{Bool,1}, ::Symbol)
include_string(::String, ::String) at loading.jl:515
include_string(::String, ::String, ::Int64) at eval.jl:30
include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N} where N) at eval.jl:34
(::Atom.##49#52{String,Int64,String})() at eval.jl:50
withpath(::Atom.##49#52{String,Int64,String}, ::String) at utils.jl:30
withpath(::Function, ::String) at eval.jl:38
macro expansion at eval.jl:49 [inlined]
(::Atom.##48#51{Dict{String,Any}})() at task.jl:80

Which I am a bit confused by. My code worked before, df2 is of type DataFrame as it should be, so what changed and how do I need to adapt my code?


#2

How did you read in the data?


#3

Before I used, which throws me now an error connected to PyCall for some reason:

EDIT: The problem was that for some reason I have to give the full path now, any idea why is that? Just giving the filename worked before. Moreover, the command I used to manipulate the Dataframe works with readxlsheet but not with CSV.read. Is there an explanation for that?

df1 = readxlsheet(DataFrame,"JSTdatasetR1.xlsx", "Data")

Now I use which is also supposed to give me a DataFrame.

df1 = CSV.read(file; delim=";", types=Dict(21=>Float64))

Is there an issue with that?


#4

If you have no NA’s in your data you can set nullable=false in CSV.read and it will return plain arrays.


#5

CSV, ODBC and other packages built on top of DataStreams currently use Nullable to handle missing values. This is different from the NA/DataArray approach that DataFrames uses by default. But it is still a DataFrame.
All this is unfortunate but it is being actively worked on. By Julia 0.7 all the packages should use a new approach that will be easier to work with - see this announcement.

In the meantime you can use DataTables that works more naturally with the Nullable data type or convert things manually.


#6

You probably had to include the full path because your working directory was different.