Faster conversion to CategoricalArray with many groups

I need to convert columns in a larger (~20M rows) DataFrame to CategoricalArrays to be used with MixedModels and FixedEffectModels (I think the latter does not require Categorical but the former does). If there are many groups in a column conversion takes extremely long (it is the longest running line in my script). Is there a way to make conversion faster/parallel? Maybe Transducers with BangBang?

I don’t have an answer unfortunately but another question - are MixedModels and FixedEffectModels fine with 20m row data? How much RAM do you have to make things work?

Yes definitely fine. Results take a couple of minutes but in the overall project that is a short amount of time. I’m using a Xeon W-2145 CPU (3.70GHz, usually boosts to around 4.2GHz) with 8 cores and 64GB of RAM. Should also work on 32GB RAM with carefull data management.

1 Like

So as I wrote this the process was killed because I was out of memory but due to me including interactions and interactions of powers so I guess it works for reasonably sized models. What works is 2 FE, 1 FE interaction and 6 main effects, 4 interactions where 2 are continuous and 2 are with one categorical variable (one binary one with 10 groups). If I then add powers and more interactions I run out of memory.