Hi,
for evaluation of a possible projet usage I did a comparison of DataFrames.jl to Pandas, with side-by-side examples and timings:
Overall, DataFrames.jl performs very well in my experiments, great work!
One functionality I could not find out-of-the-box is for writing the content of a DataFrame to a database (e.g. PostgreSQL), analogue to Pandas df.to_sql().
A simple implementation of the database upload would be (taken mostly from LibPQ.jl documentation):
using DataFrames
using LibPQ
using IterTools
function insert_by_copy!(con:: LibPQ.Connection, tablename:: AbstractString, df:: DataFrame)
row_strings = imap(eachrow(df)) do row
join((ismissing(x) ? "" : x for x in row), ",")*"\n"
end
copyin = LibPQ.CopyIn("COPY $tablename FROM STDIN (FORMAT CSV);", row_strings)
execute(con, copyin)
end
Note that this does not cover all cases - notably the column order must be the same for the DataFrame and Table and there must not be “,” in strings (and probably more edge cases I am not aware of yet).
Using the COPY command the performance is much better than using SQL Inserts, therefore this simple function outperforms Pandas df.to_sql() (but you can do the same trick for Pandas, too).
Is such a functionality already available somewhere?
If not, where would be the best point to add it? DataFrames.jl, LibPQ.jl or in a separate package?
Maybe the CSV.jl package could be used for improving the upload functionality and making it more general?
Best Regards
Benjamin