Lack of stable PostgreSQL-communication a major showstopper

ExpandingMan · November 13, 2017, 4:19pm

There is also JDBC.jl, which I wasn’t aware of until just now. Anyone have experience with how well that works?

johann.spies · November 15, 2017, 2:19pm

An illustration:

When running the same query using ODBC and PostgreSQL.jl I get the following results in a DataFrame:

ODBC (notice the type indication which I do not know how to use and which require some extra effort to make it usable).


1	Nullable{WeakRefString{UInt16}}("WOS:000088012400005")	Nullable{WeakRefString{UInt16}}("Clinical and immunologic evaluation of Cedrus deodara pollen: a new allergen from India")

PostgreSQL

1	WOS:000394076200008	Impacts of bacteria and corrosion on removal of natural organic matter and disinfection byproducts in different drinking water distribution systems

quinnj · November 15, 2017, 6:30pm

You can do ODBC.query(dsn, sql; weakrefstrings=false) in order to get normal Strings back instead of WeakRefStrings (which can provide excellent performance gains when dealing w/ lots of string values).

That doesn’t really point to any performance problems (your original post indicated “slowness”), do you have any benchmarks where ODBC was significantly slower?

avik · November 15, 2017, 9:37pm

Pretty well, but I would say that, wouldn’t I?. I’ve had reports of people using it to connect to many databases, including esoteric databases, such as Netezza and Terradata. Haven’t heard too many complaints.

johann.spies · November 16, 2017, 6:43am

Remember that I am new to Julia.

I have done elementary tests with each query collecting 10000 records.

Using odbc without weakrefstrings=false repeated 6 times:

 1.055437 seconds (257.92 k allocations: 17.382 MiB, 0.46% gc time)
 0.618304 seconds (257.92 k allocations: 16.987 MiB, 0.54% gc time)
 1.148345 seconds (257.95 k allocations: 19.973 MiB, 0.22% gc time)
 0.278638 seconds (257.96 k allocations: 20.303 MiB, 0.71% gc time)
 0.134913 seconds (257.95 k allocations: 19.521 MiB, 1.45% gc time)
1.581923 seconds (257.94 k allocations: 19.743 MiB, 0.12% gc time)

Using odbc with weakrefstrings=false repeated 6 times:

1.132961 seconds (258.27 k allocations: 15.864 MiB)
1.710556 seconds (257.91 k allocations: 15.839 MiB, 0.22% gc time)
 2.770195 seconds (257.92 k allocations: 15.844 MiB, 0.14% gc time)
1.210713 seconds (257.92 k allocations: 15.858 MiB)
1.303134 seconds (257.90 k allocations: 15.812 MiB, 0.25% gc time)
0.167738 seconds (257.90 k allocations: 15.838 MiB, 2.40% gc time)

And then I still had

Nullable{String}("WOS:000359414700517")	Nullable{String}("Comparing Bone Structure and Bone Metabolism Between Long-Term Proton Pump Inhibitor Users and Non-Users")

"df[:ut][1] =="WOS:000359414700517"
false

While with the PostgreSQL.jl interface:

1.402789 seconds (40.05 k allocations: 2.051 MiB)
 1.436988 seconds (40.05 k allocations: 1.969 MiB)
0.116383 seconds (40.05 k allocations: 1.980 MiB)
0.642422 seconds (40.05 k allocations: 1.973 MiB)
0.768851 seconds (40.05 k allocations: 1.978 MiB)
0.657475 seconds (40.05 k allocations: 1.968 MiB)

df[:ut][1] =="WOS:000359414700517"
true

ValdarT · November 16, 2017, 9:35am

Nullable types are indeed uncomfortable to work with. ODBC, DataFrames and other related packages will soon be updated to use another approach (without Nullables) that is a lot easier to work with.

ScottPJones · November 16, 2017, 1:59pm

This may be something that we (at Dynactionize) may need to be looking into, since we need to access various databases such as Postgres, MySQL/MariaDB, etc. in an efficient manner in Julia.
We’ve had to implement a variety of our own solutions, because when we started, nothing was available that met our needs, and we haven’t had time to reevaluate the current status of ODBC.jl (I am aware that @quinnj has been making great strides in these areas since we looked at this originally before JuliaCon 2015!), or PostgreSQL.jl.

I am also of the opinion that better database support will be critical for greater adoption of Julia as a general purpose programming language (which is something I strongly believe in!)

randyzwitch · November 16, 2017, 2:22pm

I feel like this goes back to what I was saying…no one seems to really be defining what is missing, just that ODBC.jl doesn’t do it (which I disagree with). I remember working with @quinnj maybe 2-3 years ago on ODBC issues, and even back then I was using the package with Hive, Teradata Aster, Teradata…no problems.

So it feels like if people are using really exotic features, then they need to enumerate which features they are looking for, open an issue and/or write a PR for them.

ScottPJones · November 16, 2017, 7:14pm

I didn’t say at all that ODBC.jl now couldn’t handle it, but back in June 2015, it wasn’t nearly ready yet.
Efficient support for strings (esp. being able to use UTF-16 strings, and handling things like UTF-8 variants correctly) and decimal numbers was (and still is!) very important for us. Back at the time, I did communicate those issues to @quinnj,
and as I said earlier, I know he’s done quite a lot to improve ODBC.jl and JSON.jl since I first met him at JuliaCon 2015.

Since then, we haven’t (yet) had time to evaluate whether ODBC.jl could now support all our requirements, or if it would work better for us than our “home-rolled” solutions.
(Personally, I’d like to move to ODBC.jl and PostgreSQL.jl, to cut down on our maintenance of our own code, and better spend that time on improving those packages for everybody’s benefit)

nalimilan · December 16, 2017, 2:02pm

ODBC.jl has been ported from Nullable to the new framework, so it should be more convenient to use and interoperable with DataFrames and related packages.

dougservice · January 7, 2021, 5:30pm

I was able to get PyCall to work with psycopg2 only by using the PyCall “with” statement syntax that automatically manages transactions.

using PyCall

pg = pyimport("psycopg2")
conn = pg.connect("host=127.0.0.1 port=5432 dbname=xxxx user=zzzz password=#$%^^%$$")
data = ()
@pywith conn begin
    @pywith conn.cursor() as curs begin
        curs.execute("SELECT * from hist.bac_stk_1min;")
        count = curs.rowcount
        data = curs.fetchall()
    end
end
println(data)
conn.close()

lungben · January 7, 2021, 6:37pm

Why are you not using GitHub - invenia/LibPQ.jl: A Julia wrapper for libpq ?
As far as I have seen, it works pretty well and is better integrated into the Julia ecosystem (e.g. Tables.jl interface).

dougservice · January 7, 2021, 11:32pm

I am new to Julia and did not find the LibPQ.jl package. Thanks for pointing it out and I will switch.

lungben · January 8, 2021, 5:48am

OK - I thought there maybe was a specific reason to use Python for the connection

In general this thread might be misleading because a lot has been changed in the last 3 year.

Topic		Replies	Views
PostgreSQL.jl no longer works in v0.6 Web Stack	27	3481	March 26, 2018
PostgreSQL in Julia: LibPQ.jl Data package , data	9	5882	July 5, 2018
Easiest and most complete package for PostgreSQL right now (Feb 2022)? General Usage postgresql	48	7065	July 11, 2025
What is the currently recommended way to access a postgres database? Data data , postgresql , sql , database	3	1315	February 23, 2022
State of the Julia ecosystem General Usage question , package	26	3009	October 20, 2017

Lack of stable PostgreSQL-communication a major showstopper

Related topics