Tridiagonal matrix operations are basically equivalent to level-1 BLAS, and an optimizing compiler with SIMD enabled can usually do just as well as the hand optimized BLAS libraries for this sort of thing.
Indeed, because tridiagonal matrix operations are linear time, there is a good chance that they aren’t the performance bottleneck in your code.