Convert sqlite data to dataframe without nullable type


#1

Hi all-

I am working with an sqlite database, but don’t really want to use the nullableArrays type. Is there a way to read the database as a regular dataframe or to easily convert to a regular dataframe?

Thanks.


#2

I would like to bring up some related issues, which ultimately motivated my original question. Unbeknownst to me, df[:column] .== x incorrectly returned an array of all false. For example, Nullable(1) == 1 is false, which, as a user, is not what I would expect, and is not consistent with the rest of the language. The solution is verbose and not consistent with the rest of the language:

f(x) = x == 1
map(f,x,;lift =true)

Moreover, its not clear to me how to extend this approach such that f() has more arguments.

I also found problems with redefining functions. Consider the following:

using SQLite
db = SQLite.DB("chinook.db")
df = SQLite.query(db,"SELECT * from artists")
myfunction(x) = x == 1
map(myfunction,df[:ArtistId],lift=true)

#Redefine myfunction and get incorrect result
myfunction(x) = x == 2
map(myfunction,df[:ArtistId],lift=true)
#However, this works properly
myfunction(2)

The database can be found here. map() appears to be using the original definition of myfunction.

As a user, I understand the need for dealing with missing values. However, the current situation seems less than ideal. Multiple dispatch should be able to handle compact Julia syntax, such as df[:col1] + 1 and the like. Otherwise, it feels like I am recreating base Julia. The problems with redefining functions and testing equality are disconcerting. I need to trust that the code is working as reasonably expected, especially for simple things. I’m not sure what is the proper venue for this, but I hope the feedback is useful somehow.


#3

Indeed data frames are not yet ready to use with NullableArray. See this thread and this announcement.

I’m not sure what’s the best way to work with SQLite.jl at the moment, but if you don’t have any missing values, just convert columns to standard Array after importing the data.


#4

Thanks for the information. Its good to know that the transition will be complete soon. It was not clear to me whether the solution to the “lifting” problem will ultimately include the ability to use typical Julia syntax (e.g. df[:col1] + 1 etc.) Since you are involved in this effort, can you clarify?

I’ll go ahead and convert to to standard arrays as you a suggested. Thanks again.


#5

It will likely be a little more complex than that: lifting will be enabled for broadcasting operations, so you would write e.g. df[:col1] .+ 1. But the recommended way of working with data frames will be via high-level APIs which will lift automatically (see the announcement for details).


#6

Probably not the best solution, but this works for me:

function convertNullableDF(df::DataFrame)::DataFrame
  for col in 1:ncol(df)
    df[col] = oftype(repeat([get(df[1, col])],inner=2), df[col])
  end
  return df
end

#7

You can use IterableTables.jl for that in the following way:

using IterableTables, SQLite, DataFrames

db = SQLite.DB(joinpath(Pkg.dir("SQLite"),"test","Chinook_Sqlite.sqlite"))

df = DataFrame(SQLite.Source(db, "SELECT name FROM sqlite_master WHERE type='table';"))

This will create a DataFrame that uses DataArrays for its columns, the standard storage for DataFrames.

If you want the columns to be just a regular Array, you can use a Query.jl:

using Query

df = @from i in SQLite.Source(db, "SELECT name FROM sqlite_master WHERE type='table';") begin
     @select {name=get(i.name)}
     @collect DataFrame
end

I have various plans for features in both IterableTables and Query that would make it easier to convert all columns from a source table into columns that can’t hold missing values, but not sure when exactly I’ll get to implementing those.


#8

Thanks for the recommendations. @davidanthoff I look forward to the new features.