Create a PostgreSQL Table from a DataFrame or CSV with Schema Inference

I am interested in finding a good method for creating a new table in a PostgreSQL database from a CSV or DataFrame without explicitly identifying the column types, much in the way that the Pandas DataFrame.to_sql() works if you attempt to load data to a table that does not exist.

I’ve read through a few related posts (e.g. this one and this one) addressing similar questions, but all of the solutions I’ve seen so far require that the table exists already.

I’m pretty much exclusively using the LibPQ package for talking to the database at this point but I’m also happy to use any other packages that might offer this functionality! (Or write something myself if there aren’t ready-made solutions).

Any pointers on where to look/possible workarounds/etc. are appreciated!

1 Like

Welcome to Julia discourse!

Here is a work-around that uses an in-memory SQLite database to generate the schema:

using DataFrames, SQLite, LibPQ
# An example test dataframe:
df = DataFrame(A = 1:3, B = [2.0, -1.1, 2.8], C = ["p","q","r"])
# Get the table schema:
table_sch = Tables.schema(df)
# Create an in-memory SQLite database:
db = SQLite.DB()
# Just create the schema in the SQLite database:
SQLite.createtable!(db, "mytable", Tables.schema(df))
# Now get back the generated SQL CREATE statement:
res = DataFrame(DBInterface.execute(db, "SELECT sql FROM sqlite_master;"))
# alternative to just get a specific name:
# res = DataFrame(DBInterface.execute(db, "SELECT * FROM sqlite_master WHERE name='mytable';"))

str = res[1,:sql] # or change the first index of `1` to other indices (e.g for multiple tables)
# Now use the string to create a table in LibPQ:
conn = LibPQ.Connection("dbname=postgres") # replace postgres with your db name.
result = LibPQ.execute(conn, str)

and from here, you can use the function from this post

to load the table into the database:

load_table!(conn, df, "mytable")
2 Likes

Thank you! I’ll give that a try.