Total Julia noob here.
I’m used to Python and R (and I really dislike the Python and R syntax and generally think the Julia syntax makes more sense for me) but cannot get my head around the types declarations in Julia. There is just something about it that I just can’t grasp. I’ve read through that documentation several times.
I created this dummy example to debug a little issue I’m having and thought it would make a good self training example. I think I’m doing at least 5 things wrong here. Can someone help me with the better way to do it (although this does work as is but probably very inefficiently)?
using ODBC
using DataFrames
using Statistics
dsn = ODBC.DSN("TEST","TEST","TEST")
createtablecmd = """
CREATE TABLE TEST_JULIA
(
X INTEGER NOT NULL
, Y INTEGER NOT NULL
, Z INTEGER NOT NULL
)
"""
ODBC.execute!(dsn, createtablecmd)
summarydf = DataFrame(ODBCSummary = Float64[], DFSummary = Float64[])
for i in 1:100
for x in 1:10
for y in 1:10
z = convert(Int64, round(rand()))
ODBC.execute!(dsn, "INSERT INTO test_julia (X, Y, Z) VALUES ($x,$y,$z)")
end
end
data = ODBC.query(dsn, "SELECT * FROM test_julia")
summary = ODBC.query(dsn, "SELECT avg(z) as z FROM test_julia")[:Z][1]
dfsummary = mean(data[:Z])
push!(summarydf, [summary dfsummary])
ODBC.execute!(dsn, "TRUNCATE TABLE test_julia")
end
summarydf[:ratio] = Float64(0)
summarydf[:delta] = Float64(0)
for i in 1:length(summarydf[:ratio])
summarydf[i,:ratio] = summarydf[i,:ODBCSummary]/summarydf[i,:DFSummary]
summarydf[i,:delta] = summarydf[i,:ODBCSummary]-summarydf[i,:DFSummary]
end
ODBC.execute!(dsn, "DROP TABLE TEST_JULIA")
The questions I’m wondering about
- Is the way I’m generating the empty dataframe correct? I’m guessing at how the type declarations are defined, although they seem to work.
- Is the way that I’m adding a row to the bottom of the dataframe the best way?
- is the way I’m retrieving a single result from ODBC the right way? It seems very inefficient (although it’s similar in Python Pandas.read_sql)
- It seems like there should be an easier way to create a calculated column for the ratio and delta, but the dot notation isn’t working here. I’m getting this error
julia> summarydf[:ratio] .= summarydf[:ODBCSummary] / summarydf[:DFSummary]
ERROR: KeyError: key :ratio not found
so I create a column first (which you don’t need to do in python or R)
julia> summarydf[:ratio] = Float64(0)
julia> summarydf[:ratio] .= summarydf[:ODBCSummary] / summarydf[:DFSummary]
ERROR: DimensionMismatch(“cannot broadcast array to have fewer dimensions”)
which is why I’m looping one row at a time which doesn’t feel right to me although it does work
Anyways, I’m hoping someone can give me some pointers. I think overall, the type declarations are the one thing holding me back from being able to fully use Julia. I seem to be having some sort of mental block there.
Thanks in advance.