In the last couple of years I have been using Julia for various data projects in different companies. What I’m always struggling with is getting all the needed data out of a database. In most cases databases can be accessed via JDBC and I have been using JDBC.jl. However I feel like I’m having issues with the package more often than not. Some examples:
- In my previous job the main problem was, that there is a number of column types found in e.g. redhsift or trino db that aren’t converted into julia types even though those julia types exist (e.g. Arrays in Trino).
- In my current project I’m trying to get data from redshift into a dataframe and as soon as the resulting table has more then 30 columns, the code in julia runs forever (The same query outside of julia takes only seconds).
- If you don’t lookout and have two columns with the same name named returned by the db, JDBC.jl doesn’t rename those before creating the DataFrame but gives an error.
I know, that there is a workaround for all of this, but it makes live hard.
Maybe there is an alternative to JDBC.jl out there, that I don’t know. Maybe I’m just not smart enough to work with JDBC.jl properly. Maybe I’m the only one needing JDBC connection for their daily work.
Anyway maybe someone out there might be able to share some best practice, alternative packages, code snippets, … ?
Would be really appreciated.