I’m trying to do a frequency table with a data frame ‘df’
frequencytable(df)
I get an error, argument not provided. What arguments do I need?
I’m trying to do a frequency table with a data frame ‘df’
frequencytable(df)
I get an error, argument not provided. What arguments do I need?
Have you read the readme. You need to give it the columns of your data frame you want to compare. Please read the documentation.
freqtable(df, :x, :y)
That’s what I thought, but there are 33 of them.
A frequency table just compares two variables. What kind of output are you looking for? Because I don’t think freqtable
will give you what you want.
No, I was going to do this first. What I want is a contingency table for a dataframe or a matrix array. I have HypothesisTests.jl loaded
Frequency tables can tabulate any number of variables you want:
julia> df = DataFrame(x = rand(1:4, 10^6), y = rand(1:3, 10^6), z = rand(1:2, 10^6), s=rand(1:2, 10^6));
julia> freqtable(df, :x, :y, :z, :s)
4×3×2×2 Named Array{Int64,4}
[:, :, z=1, s=1] =
x ╲ y │ 1 2 3
──────┼────────────────────
1 │ 20745 21000 20959
2 │ 20822 20711 20993
3 │ 20667 20770 20760
4 │ 20947 20853 20701
[:, :, z=2, s=1] =
x ╲ y │ 1 2 3
──────┼────────────────────
1 │ 20782 20793 20731
2 │ 20700 20882 20898
3 │ 20769 20988 21083
4 │ 20783 20934 20991
[:, :, z=1, s=2] =
x ╲ y │ 1 2 3
──────┼────────────────────
1 │ 21068 20799 21010
2 │ 20667 20681 20551
3 │ 20769 21149 21042
4 │ 20781 20584 21171
[:, :, z=2, s=2] =
x ╲ y │ 1 2 3
──────┼────────────────────
1 │ 20718 20937 20687
2 │ 20765 20863 20686
3 │ 20969 20921 20693
4 │ 20657 20765 20805
how do I enter variables like :x ? I take it that those are from the first row of my data frame, but df.year for example is not accepted.
I am not sure I correctly understand your question (as you enter variable :x
exactly as you have written - by writing :x
as in my example). However, if you mean you want to build a frequency table over all columns of a data frame then write:
freqtable(df, names(df)...)
df.year
will not give you a name of a variable (the name is :year
) but will give you a vector that is stored in column named :year
in data frame df
.
I get an error
ArgumentError: invalid Array dimensions
Array at boot.jl:416 [inlined]
zeros(::Type{Int64}, ::NTuple{33,Int64}) at array.jl:461
zeros(::Type{Int64}, ::Int64, ::Int64, ::Vararg{Int64,N} where N) at array.jl:457
_freqtable(::Tuple{Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Real,1},Array{Real,1},Array{Int64,1},Array{Int64,1},Array{Real,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1}}, ::Bool, ::FreqTables.UnitWeights, ::Nothing) at freqtable.jl:74
#freqtable#9(::Bool, ::FreqTables.UnitWeights, ::Nothing, ::typeof(freqtable), ::Array{Int64,1}, ::Vararg{AbstractArray{T,1} where T,N} where N) at freqtable.jl:132
freqtable(::Array{Int64,1}, ::Array{Int64,1}, ::Vararg{AbstractArray{T,1} where T,N} where N) at freqtable.jl:132
#freqtable#21(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(freqtable), ::DataFrame, ::Symbol, ::Vararg{Symbol,N} where N) at freqtable.jl:195
freqtable(::DataFrame, ::Symbol, ::Vararg{Symbol,N} where N) at freqtable.jl:194
top-level scope at Soc332Assignment3.jl:32
But what do you write when you get this error. It works for me:
julia> df = DataFrame(x = rand(1:4, 10^6), y = rand(1:3, 10^6), z = rand(1:2, 10^6), s=rand(1:2, 10^6));
julia> freqtable(df, names(df)...)
4×3×2×2 Named Array{Int64,4}
[:, :, z=1, s=1] =
x ╲ y │ 1 2 3
──────┼────────────────────
1 │ 21049 20895 20667
2 │ 20751 20834 20901
3 │ 20986 20542 20819
4 │ 20681 20685 20734
[:, :, z=2, s=1] =
x ╲ y │ 1 2 3
──────┼────────────────────
1 │ 20800 20854 20753
2 │ 20858 20863 20879
3 │ 20731 20917 20731
4 │ 20725 20571 20866
[:, :, z=1, s=2] =
x ╲ y │ 1 2 3
──────┼────────────────────
1 │ 20933 20665 20601
2 │ 20809 20620 20778
3 │ 21009 20853 20957
4 │ 20992 20715 20853
[:, :, z=2, s=2] =
x ╲ y │ 1 2 3
──────┼────────────────────
1 │ 20830 20936 21096
2 │ 20965 20837 20822
3 │ 20932 20931 21048
4 │ 20858 20803 21065
But if you want to create 33 dimensional table (as it seems you do), you will run out of memory (unless each level has only 1 or 2 values and you have at least 128GB of RAM)
Same as you, and all missing values should have a number assigned, I started with a dataframe.
As I have hypothesized - if you have 33 variables and each has at least 2 levels then you will need at least 68GB of ram to store it in memory. If this is a case probably you have a very sparse data structure and you should use SparseArrays
module and populate such an array manually.
I see. That makes sense. I don’t think I need to do a freq table for what I’m doing then. If I wasn’t away from my tower, I’d try it on there and see if it worked. I could try manual later and let you know.
OK, think I need to learn to do this mannually.