How do I add arguments to a frequency table?

I’m trying to do a frequency table with a data frame ‘df’

frequencytable(df)

I get an error, argument not provided. What arguments do I need?

Have you read the readme. You need to give it the columns of your data frame you want to compare. Please read the documentation.

freqtable(df, :x, :y)

That’s what I thought, but there are 33 of them.

A frequency table just compares two variables. What kind of output are you looking for? Because I don’t think freqtable will give you what you want.

No, I was going to do this first. What I want is a contingency table for a dataframe or a matrix array. I have HypothesisTests.jl loaded

Frequency tables can tabulate any number of variables you want:

julia> df = DataFrame(x = rand(1:4, 10^6), y = rand(1:3, 10^6), z = rand(1:2, 10^6), s=rand(1:2, 10^6));

julia> freqtable(df, :x, :y, :z, :s)
4×3×2×2 Named Array{Int64,4}

[:, :, z=1, s=1] =
x ╲ y │     1      2      3
──────┼────────────────────
1     │ 20745  21000  20959
2     │ 20822  20711  20993
3     │ 20667  20770  20760
4     │ 20947  20853  20701

[:, :, z=2, s=1] =
x ╲ y │     1      2      3
──────┼────────────────────
1     │ 20782  20793  20731
2     │ 20700  20882  20898
3     │ 20769  20988  21083
4     │ 20783  20934  20991

[:, :, z=1, s=2] =
x ╲ y │     1      2      3
──────┼────────────────────
1     │ 21068  20799  21010
2     │ 20667  20681  20551
3     │ 20769  21149  21042
4     │ 20781  20584  21171

[:, :, z=2, s=2] =
x ╲ y │     1      2      3
──────┼────────────────────
1     │ 20718  20937  20687
2     │ 20765  20863  20686
3     │ 20969  20921  20693
4     │ 20657  20765  20805

how do I enter variables like :x ? I take it that those are from the first row of my data frame, but df.year for example is not accepted.

I am not sure I correctly understand your question (as you enter variable :x exactly as you have written - by writing :x as in my example). However, if you mean you want to build a frequency table over all columns of a data frame then write:

freqtable(df, names(df)...)

df.year will not give you a name of a variable (the name is :year) but will give you a vector that is stored in column named :year in data frame df.

I get an error

ArgumentError: invalid Array dimensions
Array at boot.jl:416 [inlined]
zeros(::Type{Int64}, ::NTuple{33,Int64}) at array.jl:461
zeros(::Type{Int64}, ::Int64, ::Int64, ::Vararg{Int64,N} where N) at array.jl:457
_freqtable(::Tuple{Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Real,1},Array{Real,1},Array{Int64,1},Array{Int64,1},Array{Real,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1},Array{Int64,1}}, ::Bool, ::FreqTables.UnitWeights, ::Nothing) at freqtable.jl:74
#freqtable#9(::Bool, ::FreqTables.UnitWeights, ::Nothing, ::typeof(freqtable), ::Array{Int64,1}, ::Vararg{AbstractArray{T,1} where T,N} where N) at freqtable.jl:132
freqtable(::Array{Int64,1}, ::Array{Int64,1}, ::Vararg{AbstractArray{T,1} where T,N} where N) at freqtable.jl:132
#freqtable#21(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(freqtable), ::DataFrame, ::Symbol, ::Vararg{Symbol,N} where N) at freqtable.jl:195
freqtable(::DataFrame, ::Symbol, ::Vararg{Symbol,N} where N) at freqtable.jl:194
top-level scope at Soc332Assignment3.jl:32

But what do you write when you get this error. It works for me:

julia> df = DataFrame(x = rand(1:4, 10^6), y = rand(1:3, 10^6), z = rand(1:2, 10^6), s=rand(1:2, 10^6));

julia> freqtable(df, names(df)...)
4×3×2×2 Named Array{Int64,4}

[:, :, z=1, s=1] =
x ╲ y │     1      2      3
──────┼────────────────────
1     │ 21049  20895  20667
2     │ 20751  20834  20901
3     │ 20986  20542  20819
4     │ 20681  20685  20734

[:, :, z=2, s=1] =
x ╲ y │     1      2      3
──────┼────────────────────
1     │ 20800  20854  20753
2     │ 20858  20863  20879
3     │ 20731  20917  20731
4     │ 20725  20571  20866

[:, :, z=1, s=2] =
x ╲ y │     1      2      3
──────┼────────────────────
1     │ 20933  20665  20601
2     │ 20809  20620  20778
3     │ 21009  20853  20957
4     │ 20992  20715  20853

[:, :, z=2, s=2] =
x ╲ y │     1      2      3
──────┼────────────────────
1     │ 20830  20936  21096
2     │ 20965  20837  20822
3     │ 20932  20931  21048
4     │ 20858  20803  21065

But if you want to create 33 dimensional table (as it seems you do), you will run out of memory (unless each level has only 1 or 2 values and you have at least 128GB of RAM)

Same as you, and all missing values should have a number assigned, I started with a dataframe.

As I have hypothesized - if you have 33 variables and each has at least 2 levels then you will need at least 68GB of ram to store it in memory. If this is a case probably you have a very sparse data structure and you should use SparseArrays module and populate such an array manually.

I see. That makes sense. I don’t think I need to do a freq table for what I’m doing then. If I wasn’t away from my tower, I’d try it on there and see if it worked. I could try manual later and let you know.

OK, think I need to learn to do this mannually.