Hello Chris,
After running through the examples you gave, and following the excellent caching interface of LinearSolve.jl, I found out that KLU factorisation gives me the faster solve times than UMFPACK (though the factorisation times are quite high for KLU, but since its a one-time operation it does not matter in my case).
I have a key question. When I change the BLAS.set_num_threads value I see no difference in solve times. How can I vary the number of threads that these algorithms use, and choose the optimal number of threads? I want to run my code on my institute cluster, and want to use all resources I can to solve Ax=b in the fastest time possible.