Issue with DataFrames, operations on DataFrames now return Nullable Arrays?

I would like to change entries in a DataFrame. Before, I was using:

df2[df2[:country] .== "United States", :country] = "USA"

which basically looks for entries equal to “United States” in column country and fills those entries if “USA” instead. However, strangely something about the type of df2[:country] .== "United States" changed, which is now NullableArrays.NullableArray{Bool,1}. I am pretty sure that the last time I used my code it was a DataArray, but I am not 100% sure. Running the line above gives me

MethodError: no method matching setindex!(::DataFrames.DataFrame, ::String, ::NullableArrays.NullableArray{Bool,1}, ::Symbol)
include_string(::String, ::String) at loading.jl:515
include_string(::String, ::String, ::Int64) at eval.jl:30
include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N} where N) at eval.jl:34
(::Atom.##49#52{String,Int64,String})() at eval.jl:50
withpath(::Atom.##49#52{String,Int64,String}, ::String) at utils.jl:30
withpath(::Function, ::String) at eval.jl:38
macro expansion at eval.jl:49 [inlined]
(::Atom.##48#51{Dict{String,Any}})() at task.jl:80

Which I am a bit confused by. My code worked before, df2 is of type DataFrame as it should be, so what changed and how do I need to adapt my code?

How did you read in the data?

Before I used, which throws me now an error connected to PyCall for some reason:

EDIT: The problem was that for some reason I have to give the full path now, any idea why is that? Just giving the filename worked before. Moreover, the command I used to manipulate the Dataframe works with readxlsheet but not with CSV.read. Is there an explanation for that?

df1 = readxlsheet(DataFrame,"JSTdatasetR1.xlsx", "Data")

Now I use which is also supposed to give me a DataFrame.

df1 = CSV.read(file; delim=";", types=Dict(21=>Float64))

Is there an issue with that?

If you have no NA’s in your data you can set nullable=false in CSV.read and it will return plain arrays.

CSV, ODBC and other packages built on top of DataStreams currently use Nullable to handle missing values. This is different from the NA/DataArray approach that DataFrames uses by default. But it is still a DataFrame.
All this is unfortunate but it is being actively worked on. By Julia 0.7 all the packages should use a new approach that will be easier to work with - see this announcement.

In the meantime you can use DataTables that works more naturally with the Nullable data type or convert things manually.

1 Like

You probably had to include the full path because your working directory was different.

1 Like