I first imagine an optimization problem without filtering a density value ρ. Then the optimization problem is to minimize an objective function O(ρ) only constrained by physics’ laws (here, Maxwell’s equations) and feasible sets. Let’s suppose the O(ρ) is minized when ρ=ρ0.

Then, I go back to the case with filtering function f:ρ↦ρ'. Then the objective function will be O(ρ') and it is minimized when ρ'=ρ0. Regardless of the filtering, the final result is the same.

Folloing the logic, I don’t think filtering contributes when binarizing the result. What am I missing?

The filtering (i.e. convolving with a smoothing kernel or “diffusing” the degrees of freedom) is not to binarize the result, it is to regularize the problem by imposing a minimum lengthscale[*] on the solution. Otherwise, the solution may not converge with resolution. (This is not necessary in all problems, in principle, but is generally a good idea.)

By filtering, however, even if you start with a binary density, the smoothed density will have lots of intermediate values — you are prohibiting a nontrivial binarized result. So to allow the optimization to get back to a binarized result you need to pass the smoothed density through a step function projection step. Unfortunately, a step function is not differentiable[**], so instead the projection is done with a smoothed step function that is gradually made “steeper” as the optimization progresses (at the cost of making the optimization stiffer[**]).

There are many review articles on this filter/project style of density-based topology optimization; arguably it is the most widely used approach at the moment. See, for example, Jensen (2011).

[*] Although the filtering sets a minimum lengthscale in a certain sense, it does not guarantee that the result satisfies manufacturing constraints. That requires additional constraints; see e.g. Hammond et al (2021) and references therein.

[**] We should be posting a manuscript later this year on how to do (essentially) step function projection while being everywhere differentiable and remaining non-stiff (i.e. avoiding an ill-conditioned Hessian for the objective function).

One way to make sense of filters and how they impact optimization is to consider their effect on the gradient vector of the objective wrt the pseudo-densities/decision variables. Say O(ρ) is your unfiltered objective where ρ is the unfiltered design, and dO(ρ)/dρ is the gradient of O(ρ) wrt ρ. The common filters used can essentially be written as a linear operator ρ_f = A * ρ for some filter matrix A. If we define the filtered objective as O(ρ_f) = O(A * ρ) instead, its gradient will be A' * dO(ρ_f)/d(ρ_f). A' also has a local averaging effect. So even if ρ_f in the new formulation was equal to ρ in the old formulation (giving the same objective value), the gradients of the 2 objectives will not be equal!

As Steven mentioned, filters have a regularizing effect because they essentially average out the local partial derivatives. Due to numerical errors in the FEA, sometimes a chequerboard pattern naturally emerges as the “optimal” design, where 1 element can be pushed to have a pseudo-density of 1 and its immediate neighbors get pushed to have a pseudo-density of 0, which happens due to “high frequency” changes in the gradient’s sign. Averaging out the partial derivatives makes it less likely that 2 neighboring elements will move in 2 opposite directions when following the gradient in gradient-based optimization, essentially reducing the average norm of the gradient of the objective function (or acting like a low-pass filter) because positive and negative partial derivatives cancel out. This increases the regularity of the shapes that come out of the optimization. I also generally take the filtered design ρ_f = A * ρ (or its Heaviside projection) as the “final design” instead of ρ even though the optimization happens in terms of ρ because A * ρ is the design for which the true objective function O and constraints were evaluated.

Thank you for the explanation. So the key is to treat the filter as projection. Then, when implementing gradient method with filter, the iteration statement is ρ_f -= α(dO(ρ_f)/d(ρ_f))*dO(ρ_f)/d(ρ), not ρ -= α(dO(ρ_f)/d(ρ_f))*dO(ρ_f)/d(ρ).
I thought it was the later case.

This is not a question of numerical errors. If you don’t have something limiting the minimum lengthscale, in many physical problems the optimum design can theoretically be an unphysical structure with a lattice of arbitrarily fine features. In a discretized simulation, this will lead to features that get finer and finer (e.g. ρ that oscillates faster and faster) as you increase resolution. It doesn’t converge … unless you regularize the problem with filtering or similar.