Julia Database or CSV filtering

Hi, I am using JuliaDB.jl package to upload CSV files but having problems with filtering or extracting some data.

using JuliaDB
# t = loadtable("MTab.csv")
# Assume that the CSV contains the following data
t = table([1,1,1,1,1,2,2,2,2,2], [1,2,3,4,5,1,2,3,4,5], [10,20,22,45,50,60,44,32,83,92]; names = [:Scenario, :ID, :Adj_MV])
# Extract the values in column Adj_MV for scenario 1
RValues = filter(i -> (i.Scenario == 1), select(t, (:Adj_MV)))

This gives the following error message:

ERROR: type Int64 has no field Scenario
Stacktrace:
 [1] getproperty(::Int64, ::Symbol) at .\Base.jl:20
 [2] (::var"#74#75")(::Int64) at .\none:1
 [3] filter(::var"#74#75", ::Array{Int64,1}) at .\array.jl:2346
 [4] top-level scope at none:0

A workaround is is to create a temporary table t2 and then use the select function.

t2 = filter(i -> (i.Scenario == 1), t)
RValues = select(t2,(:Adj_MV))

However, this does not seems to be an efficient way of doing it. Can anyone please propose a correction to the following line of code?
RValues = filter(i -> (i.Scenario == 1), select(t, (:Adj_MV)))

Also, can this process i.e. extract these values be done using the CSV.jl package?

I did not use JuliaDB.jl, but for general reasons:

RValues = filter(i -> (i.Scenario == 1), select(t, (:Adj_MV)))

function arguments are evaluated before the function is called, so this is equivalent

tmp = select(t, (:Adj_MV))
RValues = filter(i -> (i.Scenario == 1), tmp)

tmp is a column :Adj_MV(a Vector) and it obviously does not contain a column :Scenario
You can rewrite your workaround with single line expression:

RValues = select(filter(i -> (i.Scenario == 1)),(:Adj_MV))

With regards to CSV.jl, you can read the file as a DataFrame, which supports these kind of select/filter operations. There is also the TableOperations.jl package that supports select/filter on generic “tables” (which CSV.File supports).

I’ve used JuliaDB quite a bit and I usually use it with JuliaDBMeta. I’m not sure if this will work for you, but just to give you an idea of what you can do with JuliaDBMeta:

using JuliaDB
using JuliaDBMeta

t = table([1,1,1,1,1,2,2,2,2,2], [1,2,3,4,5,1,2,3,4,5], [10,20,22,45,50,60,44,32,83,92]; names = [:Scenario, :ID, :Adj_MV])
t_new = @apply t begin
    @where :Scenario == 1
    @select :Adj_MV
end

julia> t_new
5-element view(::Array{Int64,1}, [1, 2, 3, 4, 5]) with eltype Int64:
 10
 20
 22
 45
 50

Depending on what you are looking for you can either collect t_new into an Array, you can make it a new table, you could convert to a DataFrame, etc.:

julia> t_new = collect(@apply t begin
           @where :Scenario == 1
           @select :Adj_MV
       end)
5-element Array{Int64,1}:
 10
 20
 22
 45
 50

julia> t_new = table(@apply t begin
           @where :Scenario == 1
           @select :Adj_MV
       end; names=[:Adj_MV])
Table with 5 rows, 1 columns:
Adj_MV
──────
10
20
22
45
50

Thanks. The suggestion RValues = select(filter(i -> (i.Scenario == 1)),(:Adj_MV))
that you kindly provided doesn’t work. I get the following error message:
ERROR: MethodError: no method matching filter(::var"#31#32")

Thanks. I want to collect it into an array, and your suggestion is perfect!

Does loading a CSV using the JuliabDB package as a table loads everything into the memory? There could be cases where the files are big and one need to access some parts of it at a time.

Sorry, my mistake

RValues = select(filter(i -> (i.Scenario == 1), t),(:Adj_MV))

I miss table t in filter call

It works now, thank you.

Does loading a CSV using the JuliabDB package as a table loads everything into the memory?

This might answer your question:

https://juliacomputing.github.io/JuliaDB.jl/latest/out_of_core/