Accessing Postgresql via Julia

I am currently using R to access a remote Postgresql database, and my code looks like those as below:

library(RPostgres)
wrds <- dbConnect(Postgres(), 
                  host='wrds-pgdata.wharton.upenn.edu',
                  port=9737,
                  user='username',
                  password='password',
                  sslmode='require',
                  dbname='wrds')

I would like to know how to do it in Julia.

1 Like

Have a look at this thread where people helped me a lot:
https://discourse.julialang.org/t/lack-of-stable-postgresql-communication-a-major-showstopper/

ODBC.jl provides access to any database that supports an ODBC driver; Postgres has excellent support for ODBC. For windows, I’d downloads the msi folder version for an easy installer, for other unix platforms, just grab the tarball from the src directory. You can setup an ODBC dsn file like the one used for testing, which can be registered w/ your system ODBC via the command odbcinst -i -s -h -f ./test/pgtest.odbc.ini.

As noted elsewhere, it takes a bit more setup, but once you’ve been thru it once or twice, I find it much easier to setup any other database.

By the way, on ODBC master on 0.6.1 I’m getting

ERROR: LoadError: LoadError: LoadError: UndefVarError: @isdefined not defined
Stacktrace:
 [1] include_from_node1(::String) at ./loading.jl:576
 [2] include(::String) at ./sysimg.jl:14
 [3] include_from_node1(::String) at ./loading.jl:576
 [4] include(::String) at ./sysimg.jl:14
 [5] include_from_node1(::String) at ./loading.jl:576
 [6] eval(::Module, ::Any) at ./boot.jl:235
 [7] _require(::Symbol) at ./loading.jl:490
 [8] require(::Symbol) at ./loading.jl:405

Just want to see if there is any update on this

using LibPQ: Connection
conn = Connection("""host = wrds-pgdata.wharton.upenn.edu port = port
                     user='username' password='password'
                     sslmode = 'require' dbname = wrds
                  """)
2 Likes

when I tried to install LibPQ on Julia 1.2, I got the error message:

ERROR: Pkg.TOML.ParserError(275, 278, “redefinition of table 3.0”)

…and 1 more exception(s).

There seem to be an issue with your active environment Project.toml. Use a new environment for that project or alternatively, clean up that Project.toml to figure out the bas entries.

how to clean up the Project.toml

You can just delete it (rename it to _Project.toml if you want to keep a copy) and create a new one by

using Pkg
Pkg.activate("path/to/prj/dir")
Pkg.add("MyFirstDep")

That should do it.

1 Like

Thank you. I got another problem

res = execute(wrds, “select column_name
from information_schema.columns
where table_schema=‘crsp’
and table_name=‘msf’
order by column_name”)

crsp_msf_var = DataFrame(columntable(res))

i need to specify the number of rows to be downloaded. sometimes the data could be too large, and i just need to see a part of it. for example, something like this:

crsp_msf_var = DataFrame(columntable(res, nrow = 10))

would be great

You can do so directly in SQL by adding

res = execute(wrds, “select column_name
from information_schema.columns
where table_schema=‘crsp’
and table_name=‘msf’
order by column_name
limit 10”)

thank you so much! this may be unrelated to this post, just want to know if DataFrame is optimized for good performance or if there is any other fast data structure for tabular data in Julia.

DataFrames.jl is one of the oldest packages in the data ecosystem and is currently the flagship when it comes to in-memory tabular data. It is quite efficient and has tons of features. However, for efficiency the optimal course would be to run operations in the performant database when possible (e.g., a relational database such as Postgres with relevant indices). For example, run any select/filter/joins/sort operations in the database when possible and work with the results with DataFrames. Same advice applies to most tabular ecosystem no matter how efficient those are (e.g., same applies to R’s data.table).

Is there a a package in julia that is similar to data.base?

What do you mean by that?

I meam data.table, sorry sbout the typo

In general DataFrames.jl offers efficient in-place operations Ă  la data.table. As for indices, IndexTables.jl does a bit of that.