Lack of stable PostgreSQL-communication a major showstopper

I am beginning to enjoy Julia and actually succeeded in doing some useful stuff as I learn the language.

I was hoping to replace what I am presently doing in Python with Julia. But I do a lot of database work and needs to be able to work with a complex database. While I have can presently communicate with PostgreSQL using a fork of PostgreSQL.jl (https://github.com/swt30/PostgreSQL.jl.git - thanks for that) but the communication is limited to such an extend that it becomes a showstopper. If I cannot get something working in the next few weeks, I will have to develop my new code in Python againā€¦ :frowning:

Unfortunately my knowledge of the libpq-api (and Julia) is too limited to really contribute to the code. Somebody posted a SQLAlchemy wrapper, but that is a no-go to me. I donā€™t even use it in Python.

ODBC is also very limited and slow.

It would be wonderful if we can have a libpq-interface in Julia that can capitalize on the full functionality of PostgreSQL. Second price would be to have something like psycopg2 in Julia.

1 Like

Unfortunately, the quoted statement is nearly 1:1 correlated with people talking about how great it would be for better Postgres support. Until someone has those skills, then the situation isnā€™t going to improve.

By using PyCall.jl, you can use a Python Postgres client from within Julia with only fairly small overhead (if any).

1 Like

Whatā€™s there in libpq that you cannot do in ODBC or JDBC and that you absolutely need? Surely in terms of SQL queries that you can express it should not make a difference.

As a Java guy Iā€™ve been querying all sorts of databases via JDBC for over 20 years and never had the feeling that I was missing something except for limitations in the vendor dialect of SQL of that database.

1 Like

SQL is a nightmare.

Among other things, the problem with SQL is that it is hundreds of different things masquerading as one thing.

By the way, why is SQLAlchemy no-go? A huge amount of work has gone into the creation and maintenance of SQLAlchemy (I feel sorry for its maintainers), so reproducing it is very difficult. Is the issue that you need more advanced PostgreSQL? If thatā€™s the case, let me just point out that you canā€™t even do things like embed Julia code into Postgre queries, so it seems to me like youā€™d have need for some python interaction anyway.

By the way, I definitely agree with the premise that we need better SQL support. Itā€™s difficult because nobody wants to work on it because, like I said, itā€™s a nightmare. Like I said I updated SQLAlchemy. I also spent some time trying to rewrite ODBC to be more Julian and bug-free, but at some point I stopped because I wanted to blow my brains out.

Lastly, I think things will get slightly better when the packages in the new Nulls ecosystem such as DataFrames and DataStreams all finally get tagged (is this happening soon I hope?).

I can try again, but ODBC has bitten me once when it filled a DataFrame with a strange datatype that I could not figure out how to handle (see Help needed with DataFrames and Query). In that case the problem was corrected when I started using PostgreSQL.jl.

I have tried to use psycopg2 in through pycall, but I had no success.

I use quite complicated sql and prefer to use it directly - not in an ORM-type of language. To filter through the SQLAlchemy documentation how to do what I want to do, seemed for me so far as a huge task of which I cannot guarantee the success. In Python I prefer psycopg2 (have tried something like asyncpg also) where I can define my query, and run it as is. I suspect it might be possible in SQLAlchemy, but so far could not find clear documentation on it.

Iā€™m not sure what you mean - PyCall.jl generally works so I donā€™t see why it wouldnā€™t work here.

2 Likes

To start with (code which I use in Python):

using DataFrames
using PyCall
@pyimport psycopg2 as pg
@pyimport psycopg2.pool as ppool

function makepool()
      pool =  ppool.SimpleConnectionPool(2, 50,
                                                 database="wos",
                                                 host = "localhost",
                                                 user = "js",
                                                 password="xxxxx",
                                                 port=5432)
end #makepool (generic function with 1 method)
pool = makepool()
con = pool.getconn()
cur = con.cursor()

type PyObject has no field getconn

Stacktrace:
 [1] include_string(::String, ::String) at ./loading.jl:515

And just there I stopped.

You absolutely can run simple string queries in SQLAlchemy. Of course you donā€™t get the proper metadata, but this is not the fault of SQLAlchemy.

With the Julia wrapper, for example

engine = createengine(cnxn_str)
tab = Table("some_table", MetaData(engine), autoload=true)
db = connect(engine)

q = db("select top * from some_table limit 100")

cols = columns(tab)

df = fetchall(cols, q)

This was done on my fork.

You need to do pool[:getconn](). See the PyCall documentation.

3 Likes

That isnā€™t how you use PyCall, as its README explains. Itā€™s not great to say that you ā€˜had no successā€™ with a volunteer open-source software package written by enthusiasts who have busy day jobs when you havenā€™t even read the 4-paragraph ā€˜USAGEā€™ section README.

5 Likes

Thanks ExpandingMan and malmaud for your replies.

I plead guilty of not knowing how to use PyCall properly.

But I want to point out that when I said that I ā€œhad no successā€ it must not be seen as an insult to
the people who worked hard to provide the libraries. I have been a committed open source user for at least two decades now. It just meant that I had no success and have looked for other avenues.

Apologies if I have been seen to be negatively critical.

2 Likes

I was in a similar situation 2 years ago when ODBC.jl with microsoft SQL on a windows server didnā€™t work due to some known and difficult bug at the time (something with utf encoding).
Luckily PyCall turned out to be a lifesaver and this solution is still running live today.
So if some database connection is missing, you can always rely on PyCall and your python sql package of choice!

Understood. I was just worried about new users possibly interpreting your post as a claim that PyCall doesnā€™t work and becoming discouraged before trying it out themselves.

1 Like

@johann.spies, Iā€™ve just been given commit access to PostgreSQL.jl thanks to @iamed2. When I have time, Iā€™ll try and get a new release of PostgreSQL.jl out so you can move away from that fork.

2 Likes

ODBC.jl is hell to set up but it does work, eventually, for some value of ā€œworkā€. PostgreSQL.jl should be decent enough to last until I have LibPQ.jl done (just got back from vacation to work on that again), which should match what youā€™re looking for.

I have a strange love for libpq, so Iā€™m happy to work on it. Keep an eye on https://github.com/iamed2/LibPQ.jl for updates.

2 Likes

Thanks @swt30 and @iamed2 and all the others partaking in this thread. This what I love about open source!

Iā€™d certainly appreciate hearing more about your experience w/ ODBC.jl; I use it regularly to connect to Postgres databases w/o issues. Itā€™s one of the few databases that are part of the automatic testing. Feel free to open an issue w/ anything that didnā€™t work and Iā€™m happy to help troubleshoot any setup issues as far as I can.