I’m no specialist in parallel backtracking but depth-first search sounds hard for parallelization. I wonder if you can turn this into a partially breadth-first algorithm? i.e., depth-first-searches-in-breadth-first-search? Or, equivalently, is there a way to “jump ahead” in the search tree? It’d cut data dependencies and let us use the canonical divide-and-conquer approach. If the “caching” aspect was important, we may need some concurrent data structures.