The usual approach is to ignore the transient – just ignore the first few output samples. Another possibility is to use a FIR filter, which may have a shorter transient; in any case, their transient length is easier to predict (roughly half the filter order).
Another interesting resource here: Transient Response, Steady State, and Decay | Introduction to Digital Filters
You may also consider using filters especially conceived to remove DC trends; there’s a good selection here: Linear-phase DC Removal Filter - Rick Lyons