Hi all!
I’m happy to announce LatentClassAnalysis.jl, a package I’ve developed for Latent Class Analysis (LCA).
Key Features
This package can do:
- Model specification with dummies or categorical indicators
- Maximum likelihood estimation via EM algorithm
- Model diagnostics and fit statistics (e.g., AIC, BIC, entropy)
- Class prediction and posterior probabilities
Examples
First, let’s load the packages and generate some data.
using LatentClassAnalysis
using DataFrames
using Random
# Generate data
Random.seed!(123)
n_samples = 1000
# True class assignments (2 latent classes)
true_classes = rand(1:2, n_samples)
# Generate responses for 4 binary items with different patterns
function generate_response(class)
if class == 1
return rand() < 0.8 ? 1 : 2 # High probability of 1
else
return rand() < 0.3 ? 1 : 2 # Low probability of 1
end
end
# Create DataFrame with responses
df = DataFrame(
item1 = [generate_response(c) for c in true_classes],
item2 = [generate_response(c) for c in true_classes],
item3 = [generate_response(c) for c in true_classes],
item4 = categorical([rand(["Yes", "No"]) for _ in 1:n_samples])
)
Next, we prepare the data and fit LCA models with different number of latent classes.
# Step 1: Prepare data
data, n_categories = prepare_data(df, :item1, :item2, :item3, :item4)
# Step 2: Fit models with different number of classes
results = []
for n_classes in 2:4
# Initialize model
model = LCAModel(n_classes, size(data, 2), n_categories)
# Fit model and get log-likelihood
ll = fit!(model, data, verbose=true)
# Get diagnostics
diag = diagnostics!(model, data, ll)
# Store results
push!(results, (
n_classes = n_classes,
model = model,
diagnostics = diag
))
println("Log-likelihood: $(diag.ll)")
println("AIC: $(diag.aic)")
println("BIC: $(diag.bic)")
println("SBIC: $(diag.sbic)")
println("Entropy: $(diag.entropy)")
end
We identify the model with the lowest BIC as the best-fitting model (not recommended though).
# Find best model based on BIC
best_n_classes = argmin(k -> results[k].diagnostics.bic, keys(results)) + 1
best_model = results[best_n_classes - 1].model
We now get predicted probabilities and class assignments.
# Step 3: Analyze best model
# Get predictions
assignments, probabilities = predict(best_model, data)
# Add predicted classes to the original DataFrame
df[!, :predicted_class] = assignments
# Calculate class sizes
class_sizes = [sum(assignments .== k) / length(assignments) for k in 1:best_n_classes]
println("\nClass sizes:")
for (k, size) in enumerate(class_sizes)
println("Class $k: $(round(size * 100, digits=1))%")
end
# Show item response probabilities for each class
println("\nItem response probabilities:")
for j in 1:best_model.n_items
println("\nItem $j:")
for k in 1:best_model.n_classes
probs = best_model.item_probs[j][k, :]
println("Class $k: $probs")
end
end
I hope that through these efforts–implementing models that I have used before in my research, Julia can be more useful for social sciences.
I’m happy to make my contributions and really wish to hear any feedback and feature requests from you! Cheers🍻️!