Issue with DataFrames, operations on DataFrames now return Nullable Arrays?

IljaK91 · July 19, 2017, 8:16am

I would like to change entries in a DataFrame. Before, I was using:

df2[df2[:country] .== "United States", :country] = "USA"

which basically looks for entries equal to “United States” in column country and fills those entries if “USA” instead. However, strangely something about the type of df2[:country] .== "United States" changed, which is now NullableArrays.NullableArray{Bool,1}. I am pretty sure that the last time I used my code it was a DataArray, but I am not 100% sure. Running the line above gives me

MethodError: no method matching setindex!(::DataFrames.DataFrame, ::String, ::NullableArrays.NullableArray{Bool,1}, ::Symbol)
include_string(::String, ::String) at loading.jl:515
include_string(::String, ::String, ::Int64) at eval.jl:30
include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N} where N) at eval.jl:34
(::Atom.##49#52{String,Int64,String})() at eval.jl:50
withpath(::Atom.##49#52{String,Int64,String}, ::String) at utils.jl:30
withpath(::Function, ::String) at eval.jl:38
macro expansion at eval.jl:49 [inlined]
(::Atom.##48#51{Dict{String,Any}})() at task.jl:80

Which I am a bit confused by. My code worked before, df2 is of type DataFrame as it should be, so what changed and how do I need to adapt my code?

bramtayl · July 19, 2017, 8:21am

How did you read in the data?

IljaK91 · July 19, 2017, 8:23am

Before I used, which throws me now an error connected to PyCall for some reason:

EDIT: The problem was that for some reason I have to give the full path now, any idea why is that? Just giving the filename worked before. Moreover, the command I used to manipulate the Dataframe works with readxlsheet but not with CSV.read. Is there an explanation for that?

df1 = readxlsheet(DataFrame,"JSTdatasetR1.xlsx", "Data")

Now I use which is also supposed to give me a DataFrame.

df1 = CSV.read(file; delim=";", types=Dict(21=>Float64))

Is there an issue with that?

jonathanBieler · July 19, 2017, 8:36am

If you have no NA’s in your data you can set nullable=false in CSV.read and it will return plain arrays.

ValdarT · July 19, 2017, 10:35am

CSV, ODBC and other packages built on top of DataStreams currently use Nullable to handle missing values. This is different from the NA/DataArray approach that DataFrames uses by default. But it is still a DataFrame.
All this is unfortunate but it is being actively worked on. By Julia 0.7 all the packages should use a new approach that will be easier to work with - see this announcement.

In the meantime you can use DataTables that works more naturally with the Nullable data type or convert things manually.

bramtayl · July 19, 2017, 11:07am

You probably had to include the full path because your working directory was different.

Topic		Replies	Views
How to deal with a Nullable DataFrame? General Usage	3	397	June 7, 2019
Data Frames for non null data Data	4	1338	February 23, 2018
Nullables - why? and how? New to Julia	6	2454	December 19, 2017
Convert sqlite data to dataframe without nullable type General Usage	9	2987	March 1, 2021
Is there a simple way if a DataFrame (say empty as in just allocated) supports DataArray or NullableArray? General Usage	2	395	January 20, 2017

Issue with DataFrames, operations on DataFrames now return Nullable Arrays?

Related topics