How to quickly bulk insert into postgres 1

This is sort of a shot in the dark, I don’t have any reason to believe there’s a better solution to this but I thought I’d give it a try.

I’m trying to insert data into tables on a postgres database. I’m using the excellent doing

Data.stream!(df, LibPQ.Statement, cnxn, str)

where str is an insert statement and df is the DataFrame I want to upload.

Everything works perfectly fine except that it is horrifyingly slow. It’s taking it something like 20 minutes to upload just 5e4 rows. I know that it’s not a connection speed issue since I get plenty of bandwidth on queries. It also seems highly unlikely that this is a Julia or LibPQ performance issue (other than that I might be using a sub-optimal function for doing this).

Does anyone know of some approach that’s not incredibly slow? I feel like this can’t possibly be the way people populate postgres databases…

I haven’t benchmark my application, but I have done it directly for each row…

@unpack isPrivate, databaseId, nameWithOwner, createdAt,
            isArchived, isFork, isMirror = node
node.isPrivate && return
execute(conn,
        """insert into universe.github_repos values(
           '$databaseId', '$nameWithOwner', '$createdAt',
            $isArchived, $isFork, $isMirror,
           '$license', '$created_at', '$as_of'
           )
           on conflict (id) do update set
           slug = excluded.slug,
           is_archived = excluded.is_archived,
           is_fork = excluded.is_fork,
           is_mirror = excluded.is_mirror,
           spdx = excluded.spdx,
           created_at = excluded.created_at,
           as_of = excluded.as_of
           """)

Have you compared it to doing it by row (DataFrames are row iterables so you can use the for row)

DataStreams is deprecated, you should use the native LibPQ methods such as LibPQ.load!:

LibPQ.load!(df, conn, insert_str)
1 Like

Or

LibPQ.CopyIn