There is also JDBC.jl, which I wasn’t aware of until just now. Anyone have experience with how well that works?
An illustration:
When running the same query using ODBC and PostgreSQL.jl I get the following results in a DataFrame:
ODBC (notice the type indication which I do not know how to use and which require some extra effort to make it usable).
1 Nullable{WeakRefString{UInt16}}("WOS:000088012400005") Nullable{WeakRefString{UInt16}}("Clinical and immunologic evaluation of Cedrus deodara pollen: a new allergen from India")
PostgreSQL
1 WOS:000394076200008 Impacts of bacteria and corrosion on removal of natural organic matter and disinfection byproducts in different drinking water distribution systems
You can do ODBC.query(dsn, sql; weakrefstrings=false)
in order to get normal Strings back instead of WeakRefStrings (which can provide excellent performance gains when dealing w/ lots of string values).
That doesn’t really point to any performance problems (your original post indicated “slowness”), do you have any benchmarks where ODBC was significantly slower?
Pretty well, but I would say that, wouldn’t I?. I’ve had reports of people using it to connect to many databases, including esoteric databases, such as Netezza and Terradata. Haven’t heard too many complaints.
Remember that I am new to Julia.
I have done elementary tests with each query collecting 10000 records.
Using odbc without weakrefstrings=false
repeated 6 times:
1.055437 seconds (257.92 k allocations: 17.382 MiB, 0.46% gc time)
0.618304 seconds (257.92 k allocations: 16.987 MiB, 0.54% gc time)
1.148345 seconds (257.95 k allocations: 19.973 MiB, 0.22% gc time)
0.278638 seconds (257.96 k allocations: 20.303 MiB, 0.71% gc time)
0.134913 seconds (257.95 k allocations: 19.521 MiB, 1.45% gc time)
1.581923 seconds (257.94 k allocations: 19.743 MiB, 0.12% gc time)
Using odbc with weakrefstrings=false
repeated 6 times:
1.132961 seconds (258.27 k allocations: 15.864 MiB)
1.710556 seconds (257.91 k allocations: 15.839 MiB, 0.22% gc time)
2.770195 seconds (257.92 k allocations: 15.844 MiB, 0.14% gc time)
1.210713 seconds (257.92 k allocations: 15.858 MiB)
1.303134 seconds (257.90 k allocations: 15.812 MiB, 0.25% gc time)
0.167738 seconds (257.90 k allocations: 15.838 MiB, 2.40% gc time)
And then I still had
Nullable{String}("WOS:000359414700517") Nullable{String}("Comparing Bone Structure and Bone Metabolism Between Long-Term Proton Pump Inhibitor Users and Non-Users")
"df[:ut][1] =="WOS:000359414700517"
false
While with the PostgreSQL.jl interface:
1.402789 seconds (40.05 k allocations: 2.051 MiB)
1.436988 seconds (40.05 k allocations: 1.969 MiB)
0.116383 seconds (40.05 k allocations: 1.980 MiB)
0.642422 seconds (40.05 k allocations: 1.973 MiB)
0.768851 seconds (40.05 k allocations: 1.978 MiB)
0.657475 seconds (40.05 k allocations: 1.968 MiB)
df[:ut][1] =="WOS:000359414700517"
true
Nullable types are indeed uncomfortable to work with. ODBC, DataFrames and other related packages will soon be updated to use another approach (without Nullables) that is a lot easier to work with.
This may be something that we (at Dynactionize) may need to be looking into, since we need to access various databases such as Postgres, MySQL/MariaDB, etc. in an efficient manner in Julia.
We’ve had to implement a variety of our own solutions, because when we started, nothing was available that met our needs, and we haven’t had time to reevaluate the current status of ODBC.jl (I am aware that @quinnj has been making great strides in these areas since we looked at this originally before JuliaCon 2015!), or PostgreSQL.jl.
I am also of the opinion that better database support will be critical for greater adoption of Julia as a general purpose programming language (which is something I strongly believe in!)
I feel like this goes back to what I was saying…no one seems to really be defining what is missing, just that ODBC.jl
doesn’t do it (which I disagree with). I remember working with @quinnj maybe 2-3 years ago on ODBC issues, and even back then I was using the package with Hive, Teradata Aster, Teradata…no problems.
So it feels like if people are using really exotic features, then they need to enumerate which features they are looking for, open an issue and/or write a PR for them.
I didn’t say at all that ODBC.jl
now couldn’t handle it, but back in June 2015, it wasn’t nearly ready yet.
Efficient support for strings (esp. being able to use UTF-16 strings, and handling things like UTF-8 variants correctly) and decimal numbers was (and still is!) very important for us. Back at the time, I did communicate those issues to @quinnj,
and as I said earlier, I know he’s done quite a lot to improve ODBC.jl
and JSON.jl
since I first met him at JuliaCon 2015.
Since then, we haven’t (yet) had time to evaluate whether ODBC.jl
could now support all our requirements, or if it would work better for us than our “home-rolled” solutions.
(Personally, I’d like to move to ODBC.jl
and PostgreSQL.jl
, to cut down on our maintenance of our own code, and better spend that time on improving those packages for everybody’s benefit)
ODBC.jl has been ported from Nullable
to the new framework, so it should be more convenient to use and interoperable with DataFrames and related packages.
I was able to get PyCall to work with psycopg2 only by using the PyCall “with” statement syntax that automatically manages transactions.
using PyCall
pg = pyimport("psycopg2")
conn = pg.connect("host=127.0.0.1 port=5432 dbname=xxxx user=zzzz password=#$%^^%$$")
data = ()
@pywith conn begin
@pywith conn.cursor() as curs begin
curs.execute("SELECT * from hist.bac_stk_1min;")
count = curs.rowcount
data = curs.fetchall()
end
end
println(data)
conn.close()
Why are you not using GitHub - invenia/LibPQ.jl: A Julia wrapper for libpq ?
As far as I have seen, it works pretty well and is better integrated into the Julia ecosystem (e.g. Tables.jl interface).
I am new to Julia and did not find the LibPQ.jl package. Thanks for pointing it out and I will switch.
OK - I thought there maybe was a specific reason to use Python for the connection
In general this thread might be misleading because a lot has been changed in the last 3 year.