Trouble making a pivot table

df1 = DataFrame(XLSX.readtable("C:/Users/Brett/Documents/Soc332Assignment3.xlsx","TestData")...)

#change everything to Float64

for name in names(df1)
       df1[!, name] = identity.(df1[:, name])
end

#change everything to real

function new_vector_correct_type(x)
         T = reduce(promote_type, typeof.(x))
         if T == Any
            return identity.(x)
        else
            return T.(x)
        end
    end


Test_matrix= Matrix(df1)


function make_pivottable(Test_matrix)
    cmap = countmap(Test_matrix)
    k = string.(collect(keys(cmap)))
    v = collect(values(cmap))
    ptab = DataFrame([k v], [:key, :counts])
    sort!(ptab, :counts, rev=true)

    return ptab
end

Test_pivot= make_pivottable(Test_matrix)

When I try to run this, I get an error

UndefKeywordError: keyword argument dims not assigned
(::Base.var"#kw##sort")(::NamedTuple{(:alg,),Tuple{SortingAlgorithms.RadixSortAlg}}, ::typeof(sort), ::Array{Int64,2}) at none:0
addcounts_radixsort!(::Dict{Int64,Int64}, ::Array{Int64,2}) at counts.jl:348
#addcounts!#70 at counts.jl:262 [inlined]
#addcounts! at none:0 [inlined]
#countmap#73 at counts.jl:389 [inlined]
countmap at counts.jl:389 [inlined]
make_pivottable(::Array{Int64,2}) at Soc332Assignment3.jl:58
top-level scope at Soc332Assignment3.jl:67

This is not a great error message. And maybe a bug in countmap.

countmap claims to work with matrices, but it throws an error when it tries to sort it.

My suggestion would be to use

cmap = countmap(vec(test_matrix))

To be clear, you want to get the counts of the unique values across the whole data frame or just across one vector of the data frame?

using DataFrames, Statistics, FreqTables

N = 1000

speed_opt = ["Speed up", "Slowed down"]
gender_opt = ["Male", "Female"]
age_opt = ["Young", "Old"]
time_day_opt = ["Rush hour", "Non-rush houw"]
df = DataFrame(
	speed = rand(speed_opt, N),
	gender = rand(gender_opt, N),
	age = rand(age_opt, N),
	time_day = rand(time_day_opt, N)
)

function get_stats(df, var1, var2)
	Ns = freqtable(df, var1, var2)
	N = sum(Ns)
	percents = Ns ./ N

	names_Ns = names(Ns)

	results = DataFrame()

	results[:, var1] = names_Ns[1]
	for name in names(Ns)[2]
		results[:, Symbol(name, :_N)] = Ns[:, name]
		results[:, Symbol(name, :_pct)] = percents[:, name]
	end
	return results
end

t = freqtable(df, :speed, :gender)
get_stats(df, :speed, :gender)
3 Likes

Why is there a row that says

df = DataFrame()

You can initialize an emtpy dataframe and then add columns to it. Please read the DataFrames documentation.

2 Likes
function get_stats(df1, consume, marijuana)
	Ns = freqtable(df1, consume, marijuana)
	N = sum(Ns)
	percents = Ns ./ N

	names_Ns = names(Ns)

	df = DataFrame()

	df1[consume, marijuana] = names_Ns[1]
	for name in names(Ns)[2]
		df1[:, Symbol(name, :_N)] = Ns[:, name]
		df1[:, Symbol(name, :_pct)] = percents[:, name]
	end
	return df1
end

t = freqtable(df1, :consume, :marijuana)
get_stats(df, :_N, :pct)

I thought I had it, but I’m not sure what to have in get_stats, that the df line isn’t highlighted as an error.

You need df[:, marijuana].

Seems to have fixed that issue, now it’s saying the columns aren’t the same length, but not sure why since the original columns were the same length.

I’m not sure what’s going on. The code I posted above seems to work. Without a full MWE (see how I created the DataFrame above), its tough to help you.

Ok, I’ll see what I can put together.

f1 = DataFrame(XLSX.readtable("C:/Users/Brett/Documents/Soc332Assignment3.xlsx","TestData")...)

#change everything to Float64

for name in names(df1)
       df1[!, name] = identity.(df1[:, name])
end

#change evrything to real

function new_vector_correct_type(x)
         T = reduce(promote_type, typeof.(x))
         if T == Any
            return identity.(x)
        else
            return T.(x)
        end
    end


Test_matrix= Matrix(df1)
Test_contingency= freqtable(df, names(df1)...)

#change this over for my data
N = 608


function get_stats(df1, consume, marijuana)
	Ns = freqtable(df1, consume, marijuana)
	N = sum(Ns)
	percents = Ns ./ N

	names_Ns = names(Ns)

	df = DataFrame()

	df[:, marijuana] = names_Ns[1]
	for name in names(Ns)[2]
		df[:, Symbol(name, :_N)] = Ns[:, name]
		df[:, Symbol(name, :_pct)] = percents[:, name]
	end
	return df1
end

t = freqtable(df, :consume, :marijuana)
get_stats(df, :consume, :marijuana)

Data Copied from Excel

Year Crude_Birth_Rate Female_Employment Female_Unemployment Female_Participation
1976 15.6 50 7.4 54
1977 15.5 51.3 8 55.8
1978 15.2 53.6 8.6 58.7
1979 15.4 55.7 7.7 60.4
1980 15.4 58 7 62.4
1981 15.3 60.3 7.5 65.2
1982 15.1 59.7 9.6 66.1
1983 15 60.5 10.5 67.6
1984 15 62.1 10.6 69.4
1985 14.9 63.8 10.2 71
1986 14.5 66.5 9.2 73.2
1987 14.4 67.5 8.7 74
1988 14.5 69.4 7.9 75.3
1989 14.9 70.5 7.9 76.6
1990 15 71.5 7.9 77.7
1991 14.9 70.5 9.3 77.8
1992 14.7 69.4 9.6 76.8
1993 13.6 69.1 10.3 77.1
1994 13.1 69.6 9.5 76.9
1995 12.6 70.4 8.7 77.1
1996 12 70.9 8.9 77.8
1997 11.6 72.1 8.1 78.5
1998 11.4 73.2 7.3 79
1999 11.2 74.3 6.6 79.6
2000 10.7 75.1 6 79.9
2001 10.7 75.2 6.3 80.3
2002 10.5 75.9 6.7 81.3
2003 10.6 76.2 6.8 81.8
2004 10.5 77 6.3 82.2
2005 10.6 76.9 5.9 81.8
2006 10.9 77.1 5.6 81.7
2007 11.2 78.3 5 82.5
2008 11.3 77.9 5 82
2009 11.3 77 6.4 82.3
2010 11.1 76.7 6.9 82.3
2011 11 76.8 6.4 82
2012 11 77.2 6.1 82.3
2013 10.8 78 5.7 82.7
2014 10.8 77.5 5.6 82
2015 10.7 77 5.7 81.6
2016 10.6 77.6 5.6 82.2
2017 10.3 78.6 5.3 83
2018 10.1 78.9 5 83

Note that your function does not include the variables :consume or :marijuana. However if I add them, everything works.

julia> df.consume = rand(["Consume", "Did not consume"], nrow(df));

julia> df.marijuana = rand(["Marijuana", "None"], nrow(df));

julia> get_stats(df, :consume, :marijuana)
# the correct output

PS. It would be really helpful if you could build your skills in generating DataFrames randomly to produce MWEs. I think it would make getting help a lot easier when people have to only copy and paste a single block of code.

1 Like

That’s because I used the wrong data set.

1 1 1 1 1
1 3 2 2 1
2 5 2 2 1
2 5 2 2 2
1 5 2 2 2
2 5 2 1 2
1 4 1 2 2
1 5 1 2 2
2 5 2 2 2
1 5 2 1 2
2 5 2 2 2
1 5 1 2 2
1 5 1 2 2
2 5 2 2 2
2 5 1 2 2
2 5 2 2 2
2 5 1 2 2
2 5 2 2 2
2 5 1 2 1
1 5 1 1 1
2 5 2 1 2
2 5 2 2 2
1 2 1 2 2
1 5 1 2 2
1 5 1 2 2
2 5 1 2 2
1 5 1 2 2
2 5 2 2 2
1 5 1 2 2
2 5 2 2 2
2 5 1 2 2
1 5 2 2 2
1 1 1 1 1
1 5 1 2 2
1 3 1 2 2
2 5 2 2 2
2 5 2 2 2
2 5 2 2 2
1 5 2 1 2
1 5 1 2 2
2 5 1 1 1
1 5 1 2 2
1 5 1 2 2
2 5 1 1 1
2 5 1 2 2
1 5 2 1 2
1 5 2 2 2
2 5 1 1 1
2 1 2 1 2
2 5 2 2 2
1 5 1 2 2
2 5 1 2 2
1 4 1 2 2
2 5 2 2 1
1 3 1 2 2
2 5 1 1 1
1 5 1 2 2
1 3 1 2 2
1 2 1 2 2
2 5 2 1 2
1 5 2 2 2
2 5 1 2 2
2 5 1 2 2
1 1 2 2 2
2 5 2 1 2
1 5 2 1 2
2 5 1 1 1
1 3 1 2 2
2 5 1 2 2
1 5 1 2 2
1 5 2 2 1
1 5 1 2 2
1 5 2 2 2
1 5 1 2 2
1 5 2 1 2
1 5 2 1 2
1 5 2 1 2
2 5 2 2 1
2 5 2 2 2
1 5 1 2 2
2 5 1 1 1
1 5 1 1 1
1 1 1 2 2
1 1 2 1 2
1 1 1 1 1
1 4 1 2 2
2 5 1 1 1
2 5 2 1 2
1 5 1 2 2
2 5 2 2 2
2 5 1 2 1
1 5 2 2 2
2 5 2 2 2
2 5 1 2 2
2 5 1 2 2
1 5 1 1 2
1 3 1 2 2
2 5 1 1 2
1 5 1 2 2
1 3 1 1 1
2 5 2 2 1
2 5 2 1 2
2 5 1 2 2
1 5 1 2 2
2 5 1 1 1
1 3 1 1 2
2 5 1 2 2
2 5 2 1 2
1 5 2 1 2
2 5 1 2 2
2 5 2 2 2
1 1 1 2 2
1 5 1 1 1
2 5 1 2 2
2 5 1 1 1
2 5 1 1 1
1 4 1 1 1
2 5 1 2 2
1 5 2 2 1
2 5 1 2 2
2 5 1 2 2
1 5 2 2 2
1 4 1 1 2
1 5 1 2 2
2 5 1 1 1
2 5 2 2 1
1 4 1 2 2
1 5 1 1 2
2 5 2 1 2
2 5 1 2 2
2 5 1 2 2
2 3 2 1 2
2 5 1 1 1
2 5 2 2 2
1 5 2 2 1
2 5 1 1 1
2 5 2 2 2
2 5 2 2 2
2 5 1 1 1
2 4 2 2 2
2 5 2 2 2
2 5 1 2 1
2 2 2 1 1
2 5 1 1 1
2 5 1 2 2
2 5 1 2 2
2 5 1 2 2
2 2 2 2 1
2 5 2 2 2
2 2 2 1 1
2 5 2 2 2
2 3 1 2 2
2 5 1 2 2
2 5 2 1 2
9 9 9 9 9
9 9 9 9 9
1 5 1 2 2
2 5 2 1 2
1 5 1 1 1
2 5 2 2 2
2 5 2 2 2
2 5 2 2 2
2 5 1 2 2
1 5 1 2 2
2 5 2 2 2
2 5 1 2 2
1 5 1 2 2
2 5 1 2 2
1 5 1 2 2
1 5 2 2 1
1 5 1 2 2
2 5 1 1 2
2 5 2 2 2
1 5 2 1 1
1 5 2 2 2
2 5 2 2 2
1 5 1 2 2
1 5 1 2 1
2 5 1 1 1
2 5 2 2 2
1 5 1 1 2
2 5 2 2 1
1 5 1 1 2
1 5 1 2 2
1 5 1 2 2
2 5 1 2 2
2 5 1 2 2
2 5 1 2 2
1 5 1 2 2
1 1 1 2 2
2 5 1 2 1
2 5 2 2 2
1 5 1 2 2
2 5 1 1 1
2 5 1 1 1
2 5 1 2 2
2 5 1 1 1
2 3 1 1 1
2 5 2 2 1
1 5 1 2 2
2 5 2 2 2
2 5 1 1 1
2 5 2 2 2
1 5 1 2 2
2 5 1 2 2
1 5 1 1 1
2 1 1 2 2
1 1 2 1 2
1 5 2 2 2
2 5 2 2 2
2 5 1 2 2
1 5 1 2 2
2 5 2 2 2
1 1 1 1 2
2 5 2 1 2
2 5 2 2 2
1 5 2 2 2
2 5 2 2 2
2 5 2 2 2
2 5 1 1 2
1 5 1 1 2
1 2 1 2 1
1 5 1 1 2
2 5 1 2 2
2 5 1 2 1
2 5 1 2 2
2 5 1 1 1
2 5 1 1 1
1 5 2 2 2
1 5 2 2 2
2 5 1 2 2
1 5 2 2 2
2 5 1 2 2
2 5 1 1 1
2 5 2 1 2
2 5 1 2 2
2 5 1 1 2
1 5 2 2 2
1 5 1 2 2
1 4 1 1 1
2 5 1 1 2
2 5 1 2 2
2 5 2 2 2
1 5 2 2 2
1 5 2 1 1
1 5 1 2 2
2 5 2 2 2
1 5 2 2 2
1 5 1 2 2
2 5 2 1 2
2 5 2 2 2
2 2 2 1 2
1 5 1 2 2
1 5 1 2 2
1 5 2 2 1
2 5 2 2 2
2 5 1 2 2
1 5 1 1 2
1 5 2 1 2
1 5 1 2 2
2 5 1 2 2
2 5 1 2 2
1 3 1 1 2
1 2 1 1 1
1 5 1 2 2
2 5 1 1 1
2 5 2 2 2
2 5 2 2 2
2 5 1 2 2
2 5 1 2 2
1 5 2 2 2
2 5 2 2 2
2 5 2 2 2
2 5 1 2 2
2 5 1 2 1
1 5 1 2 2
1 1 2 1 2
1 4 1 2 2
2 5 2 2 2
1 3 1 2 2
1 4 1 2 2
1 5 1 1 2
2 5 1 1 1
1 5 2 2 2
1 5 1 1 1
2 5 2 2 2
1 5 1 1 1
1 5 1 1 1
2 5 1 1 1
1 5 1 2 2
2 5 2 2 2
2 5 1 2 2
2 5 1 2 2
2 5 2 2 2
1 5 1 2 2
2 5 1 1 1
2 5 2 2 2
1 1 1 1 1
1 4 1 2 1
2 5 2 2 1
2 5 2 2 2
1 5 2 2 2
2 5 1 2 2
2 5 2 2 2
2 5 2 2 1
1 5 2 2 1
1 5 1 2 2
2 5 1 2 2
1 5 1 1 1
1 5 2 2 2
2 5 2 2 1
2 5 2 1 2
2 5 2 2 1
1 4 1 2 2
2 5 1 2 2
1 5 1 2 2
2 5 1 1 1
1 4 1 2 2
2 3 2 2 1
1 5 1 1 1
1 5 1 2 2
2 5 1 2 1
1 4 1 2 2
1 2 1 1 1
1 5 2 1 2
1 5 1 1 1
2 5 2 1 2
2 5 1 2 1
2 5 1 1 1
2 5 1 1 1
2 5 2 2 2
2 5 2 2 2
1 5 2 2 1
1 5 1 1 1
1 5 2 2 2
1 5 1 2 2
2 5 2 1 2
2 4 1 2 2
1 1 1 1 1
1 5 1 1 1
1 5 1 2 2
2 5 1 2 2
2 5 1 2 2
2 5 1 1 1
1 5 1 2 2
2 5 2 2 2
1 3 1 2 2
1 4 2 2 1
1 5 1 2 2
2 5 2 1 2
2 5 1 2 1
1 5 1 2 2
2 5 1 2 2
2 5 1 2 2
2 5 2 1 2
1 4 1 2 2
1 5 2 1 2
2 5 1 2 2
1 3 2 1 1
1 4 1 2 2
1 5 1 2 2
2 5 2 2 2
2 5 2 2 1
2 5 2 2 2
2 5 1 2 2
2 5 2 2 2
2 5 2 2 2
1 5 1 2 2
1 5 1 2 2
2 5 2 2 2
2 5 1 2 2
1 3 1 2 2
2 5 2 1 2
2 5 1 2 2
2 5 1 1 1
2 3 2 1 2
2 5 2 2 2
1 4 2 2 1
1 5 1 2 2
2 5 1 2 2
2 5 2 2 2
2 5 1 1 2
1 5 1 2 2
2 5 2 2 2
1 5 2 1 2
1 2 2 1 2
2 5 1 2 1
2 5 2 2 2
1 5 2 2 2
2 5 1 2 2
1 5 2 1 2
1 5 1 2 2
2 5 2 2 2
2 5 1 2 2
2 5 1 2 2
2 5 1 2 2
2 5 1 1 2
2 5 1 1 1
1 3 1 1 1
1 5 2 1 2
2 5 1 1 1
1 3 2 2 1
2 5 1 2 2
1 2 1 1 1
2 5 2 1 2
2 5 1 1 1
2 5 2 1 2
1 5 1 2 2
2 5 1 1 1
2 1 1 2 2
1 5 1 1 1
1 2 1 1 1
1 5 1 1 1
2 5 2 2 2
2 3 1 1 1
2 5 2 2 2
1 5 2 2 2
2 5 2 1 1
2 5 1 1 1
1 2 2 1 2
2 5 1 1 1
2 5 1 1 1
2 5 2 2 2
1 4 1 2 2
1 5 2 2 1
1 3 1 1 1
1 5 1 2 1
1 5 1 2 1
2 5 2 2 1
2 5 1 2 2
1 5 2 1 2
9 9 9 9 9
1 5 1 2 2
1 5 1 2 2
1 5 2 2 2
1 2 1 1 1
2 5 1 2 2
1 3 2 1 2
2 5 1 2 2
2 5 2 2 1
1 5 1 2 2
2 5 2 1 2
2 5 1 2 1
2 5 1 2 1
1 5 1 1 2
2 5 2 2 2
2 5 1 2 2
1 5 2 2 2
2 5 2 2 2
1 4 1 2 2
2 5 1 2 2
2 1 9 9 9
1 5 1 2 2
2 5 2 2 1
1 4 1 1 1
1 5 2 1 2
1 4 2 2 2
1 5 2 1 2
1 5 1 2 2
1 2 1 1 1
2 5 1 1 1
1 5 1 2 2
2 5 2 2 2
2 5 1 2 1
2 5 1 2 2
2 5 1 1 1
2 5 2 2 2
1 5 1 2 2
2 5 1 2 2
2 5 2 2 2
2 5 1 2 1
1 5 1 2 2
2 1 2 2 2
1 5 1 1 1
2 5 1 2 2
2 5 2 1 2
2 5 1 2 2
2 5 1 2 1
1 5 1 2 2
1 1 1 2 2
2 5 1 1 1
1 4 2 1 2
1 2 1 1 1
2 5 2 2 2
2 5 1 1 1
1 2 2 1 2
1 4 2 2 2
1 5 1 2 2
2 5 2 2 2
2 5 2 2 1
2 5 2 2 1
2 5 2 2 2
2 5 1 2 2
1 5 2 2 2
2 5 1 2 2
2 5 2 2 1
1 4 1 1 2
2 5 1 2 2
2 5 1 2 2
2 5 1 2 2
1 2 2 2 2
2 5 1 2 2
1 5 1 1 1
1 1 1 2 2
1 4 1 2 2
2 5 2 1 2
2 5 1 2 2
1 5 1 1 2
1 5 1 1 1
2 5 1 2 2
2 5 2 2 2
1 5 1 1 1
1 5 2 2 2
1 5 1 2 2
1 5 1 2 2
1 5 1 2 1
2 5 1 1 1
1 5 2 1 2
1 2 1 2 2
1 5 1 2 1
1 5 2 1 2
1 5 1 2 2
1 2 2 1 2
1 4 1 1 2
2 5 2 1 2
1 5 1 1 1
1 5 2 2 1
2 5 1 1 1
1 5 1 1 1
2 5 2 1 2
1 5 1 2 2
1 3 2 2 1
1 1 2 1 2
1 5 1 1 1
2 5 1 2 1
2 5 1 2 2
2 5 1 2 2
1 5 2 2 1
2 5 2 2 2
2 5 2 2 2
2 5 1 2 2
2 5 1 1 1
2 5 1 2 2
1 5 1 2 2
1 4 1 1 1
2 5 1 2 2
1 5 1 1 2
1 5 1 2 2
1 5 2 1 2
1 5 1 1 1
2 3 1 1 2
2 5 2 2 1
2 5 2 2 1
1 1 1 2 2
2 9 2 2 2
2 5 2 2 2
2 5 2 2 1
2 5 1 2 2
2 5 1 2 2
2 5 2 2 2
2 5 1 1 2
2 5 1 2 2
2 5 1 1 2
2 3 2 2 1
2 5 1 1 1
2 5 2 2 1
1 5 2 2 1
1 5 1 2 2
1 5 1 1 1
1 5 1 1 2
1 1 2 2 1
2 5 2 2 2
9 9 9 9 9
9 9 9 9 9
9 9 9 9 9
1 5 2 1 2
1 5 2 1 2
1 5 2 2 2
1 5 2 1 2
2 5 2 2 2
1 5 1 1 1
1 5 1 2 1
2 5 2 2 2
1 5 2 2 2
2 2 2 1 2
1 5 1 2 2
2 5 1 1 1
2 5 1 1 1
2 5 1 1 1
1 1 1 1 1
2 5 2 2 2
1 5 2 1 2
2 5 1 2 1
2 5 2 2 2
2 5 2 2 2
2 5 1 2 2
2 5 2 2 1
2 5 1 1 1
2 5 2 2 2
2 5 1 1 1
1 5 2 1 2
1 5 1 1 1
2 5 1 2 2
2 5 1 2 2
1 5 1 2 2
1 5 1 1 1
1 1 1 1 1
2 5 1 1 1

Do you see how I’ve been generating random vectors to create my MWEs though? It would be really helpful if you did what I did above when asking for help.

I think you’re right. The problem is that at this stage I’m getting a bounds error. my dataset is [1,2,3,4,5,6],9] , and I think that’s because 9 is not sequential, not sure if it would would work better with 9 as a :string or NAN, since 9 represents a missing value.

I don’t understand what you mean by this. I don’t understand what a vector being “sequential” has to do with your problem. I would bet that if you isolated the problem to an MWE you would find that the code works.

1 Like

Ok, I tried it and it works. I guess what I need to do now is make it work with my spreadsheet.

My goal isn’t to sort random numbers, but to analyze data.

I see. I should not have used df as both the name of the input data frame and the name of the output data frame in my function. I can see how that can result in errors. my apologies for writing sloppy code!

You did the correct thing by naming the input data frame df1 in your function.