Reducing TTFX of a macro-focused package (`DataPipes`)

I’m trying to evaluate precompilation benefits for my packages, and understand what to run during precompilation for best results in different situations.

This question is about macro-focused packages, taking DataPipes as an example. Without any precompilation directives, timings are:

precompile: <1 sec
`using DataPipes`: 0.012200 seconds (8.16 k allocations: 578.810 KiB)
`@eval @p 123`: 2.206762 seconds (2.77 M allocations: 172.314 MiB, 7.64% gc time, 99.89% compilation time)
Command to run
julia --startup=no --project -e 'using Pkg; Pkg.precompile(); using InteractiveUtils; @showtime using DataPipes; @showtime @eval @p 123'

After adding a short statement run during precompilation (this line), I get:

precompile: 9 sec
`using DataPipes`: 0.306198 seconds (520.01 k allocations: 23.272 MiB, 29.38% gc time)
`@eval @p 123`: 0.266229 seconds (326.42 k allocations: 21.089 MiB, 99.29% compilation time)

Does that look like the correct way to precompile macro packages, are there any specifics?

Is the 30x loading time increase expected? And 10x precompilation time?

What can be done to reduce TTFX even further? It’s reasonably fast already, but wonder if more improvements are possible through precompilation.