If B does not have special structure, there isn’t a lot special to be done about the multiplications A * B * A' themselves.
Only needing a small subblock of the result means that you should not compute the full product. Instead use R[i,j] == A[i,:] * B * A[j,:]' (where i/j, can be single indices or sets of indices) as suggested above.
Note that these matrices are so large (~160GB) that most systems won’t even be able to hold one in RAM, much less both. Is there further structure you can exploit? Are X and B totally arbitrary matrices or do they possess low rank or some other structure?
Unless you can do something clever about computing A = inv(I-X) itself, that calculation would be at least half your cost anyway (and almost all of it if you only need a tiny block of the result) so there are limited gains from anything else. One possibility is to use the block matrix inverse in order to compute only A[i,:] for the subset i that you need, but I don’t think that actually saves a ton here without further structure.
Alternatively, for some X you may be able to use the Neumann series, which allows that
(I - X)^{-1} = \sum_{n=0}^\infty X^n when the sum converges. If it converges, then you should be able to compute this via A_N = \sum_{n=0}^N X^n where \lim_{N\rightarrow\infty} A_N = A. You can compute A_N recursively via A_{n+1} = I + A_{n}X from initial condition A_0 = I until convergence. Since you only care about A[i,:] (the rows of A indexed by i), you can instead start with A_0 = I[i,:] (the rows i of the identity matrix). In other words, the following function should either compute A[i,:] from X or throw an error if it will not work:
using LinearAlgebra
computeAi(X, ::Colon=Colon()) = computeAi(X, axes(X, 1)) # compute full matrix
function computeAi(X, i)
Ii = UniformScaling(one(eltype(X)))[i, axes(X, 2)] # I[i,:]
A = Ii
while true
Aprev = A
A = Ii + Aprev * X # could remove per-loop memory allocations with some effort
all(isfinite, A) || error("does not converge")
A == Aprev && return A # may want to make the comparison approximate
end
end
This is efficient when i is a small subset of the full dimension. When this converges, R[i,j] == computeAi(X, i) * B * computeAi(X, j)'.