I am learning how to use DataFrames and find am struggling to use it for what I need it for.
My present problem is illustrated by this attempt to use Quey.jl. I cannot understand why there result of q1 has no values.
And I cannot understand why I cannot use @collect DataFrame in this case.
julia> typeof(bs)
DataFrames.DataFrame
julia> size(bs)
(33067, 7)
julia> bs[1:2,:]
2×7 DataFrames.DataFrame
│ Row │ broadfield │ field │ title │ pubtype │ pubyear │ doctype │ publications │
├─────┼───────────────────────┼────────────────────────┼──────────────────┼───────────┼─────────┼───────────┼──────────────┤
│ 1 │ "Biological Sciences" │ "Anatomy & Morphology" │ "ACTA ZOOLOGICA" │ "Journal" │ 2006 │ "Article" │ 1 │
│ 2 │ "Biological Sciences" │ "Anatomy & Morphology" │ "ACTA ZOOLOGICA" │ "Journal" │ 2007 │ "Article" │ 2 │
julia> names(bs)
7-element Array{Symbol,1}:
:broadfield
:field
:title
:pubtype
:pubyear
:doctype
:publications
julia> bfs = Set(bs[:broadfield])
Set(Nullable{String}["Mathematics", "Statistics", "Geological Sciences", "Biological Sciences", "Physics", "Computer Science", "Chemistry"])
julia> describe(bs)
broadfield
Summary Stats:
Length: 33067
Type: Nullable{String}
Number Unique: 7
Number Missing: 0
% Missing: 0.000000
field
Summary Stats:
Length: 33067
Type: Nullable{WeakRefString{UInt16}}
Number Unique: 55
Number Missing: 0
% Missing: 0.000000
title
Summary Stats:
Length: 33067
Type: Nullable{WeakRefString{UInt16}}
Number Unique: 5229
Number Missing: 0
% Missing: 0.000000
pubtype
Summary Stats:
Length: 33067
Type: Nullable{WeakRefString{UInt16}}
Number Unique: 5
Number Missing: 0
% Missing: 0.000000
pubyear
Summary Stats:
Mean: 2010.479421
Minimum: 2005.000000
1st Quartile: 2008.000000
Median: 2011.000000
3rd Quartile: 2013.000000
Maximum: 2015.000000
Length: 33067
Type: Int16
Number Missing: 0
% Missing: 0.000000
doctype
Summary Stats:
Length: 33067
Type: Nullable{WeakRefString{UInt16}}
Number Unique: 15
Number Missing: 0
% Missing: 0.000000
publications
Summary Stats:
Mean: 2.720930
Minimum: 1.000000
1st Quartile: 1.000000
Median: 1.000000
3rd Quartile: 3.000000
Maximum: 210.000000
Length: 33067
Type: Int64
Number Missing: 0
% Missing: 0.000000
julia> q1 = @from i in bs begin
@where i.broadfield == "Chemistry"
@select i.field
@collect
end
0-element Array{Nullable{WeakRefString{UInt16}},1}
julia> q1 = @from i in bs begin
@where i.broadfield == "Chemistry"
@select i.field
@collect DataFrame
end
ERROR: MethodError: Cannot `convert` an object of type Query.EnumerableSelect{Nullable{WeakRefString{UInt16}},Query.EnumerableWhere{NamedTuples._NT_broadfield_field_title_pubtype_pubyear_doctype_publications{Nullable{String},Nullable{WeakRefString{UInt16}},Nullable{WeakRefString{UInt16}},Nullable{WeakRefString{UInt16}},Nullable{Int16},Nullable{WeakRefString{UInt16}},Nullable{Int64}},Query.EnumerableIterable{NamedTuples._NT_broadfield_field_title_pubtype_pubyear_doctype_publications{Nullable{String},Nullable{WeakRefString{UInt16}},Nullable{WeakRefString{UInt16}},Nullable{WeakRefString{UInt16}},Nullable{Int16},Nullable{WeakRefString{UInt16}},Nullable{Int64}},IterableTables.DataFrameIterator{NamedTuples._NT_broadfield_field_title_pubtype_pubyear_doctype_publications{Nullable{String},Nullable{WeakRefString{UInt16}},Nullable{WeakRefString{UInt16}},Nullable{WeakRefString{UInt16}},Nullable{Int16},Nullable{WeakRefString{UInt16}},Nullable{Int64}},Tuple{NullableArrays.NullableArray{String,1},NullableArrays.NullableArray{WeakRefString{UInt16},1},NullableArrays.NullableArray{WeakRefString{UInt16},1},NullableArrays.NullableArray{WeakRefString{UInt16},1},NullableArrays.NullableArray{Int16,1},NullableArrays.NullableArray{WeakRefString{UInt16},1},NullableArrays.NullableArray{Int64,1}}}},##10#12},##11#13} to an object of type DataFrames.DataFrame
This may have arisen from a call to the constructor DataFrames.DataFrame(...),
since type constructors fall back to convert methods.
Regards
Johann