for evaluation of a possible projet usage I did a comparison of DataFrames.jl to Pandas, with side-by-side examples and timings:
Overall, DataFrames.jl performs very well in my experiments, great work!
One functionality I could not find out-of-the-box is for writing the content of a DataFrame to a database (e.g. PostgreSQL), analogue to Pandas
A simple implementation of the database upload would be (taken mostly from LibPQ.jl documentation):
using DataFrames using LibPQ using IterTools function insert_by_copy!(con:: LibPQ.Connection, tablename:: AbstractString, df:: DataFrame) row_strings = imap(eachrow(df)) do row join((ismissing(x) ? "" : x for x in row), ",")*"\n" end copyin = LibPQ.CopyIn("COPY $tablename FROM STDIN (FORMAT CSV);", row_strings) execute(con, copyin) end
Note that this does not cover all cases - notably the column order must be the same for the DataFrame and Table and there must not be “
,” in strings (and probably more edge cases I am not aware of yet).
COPY command the performance is much better than using SQL Inserts, therefore this simple function outperforms Pandas
df.to_sql() (but you can do the same trick for Pandas, too).
Is such a functionality already available somewhere?
If not, where would be the best point to add it? DataFrames.jl, LibPQ.jl or in a separate package?
Maybe the CSV.jl package could be used for improving the upload functionality and making it more general?