Is it possible? How? With what technique?
Full simulation is very computationally expensive. Human-heard sound is vibration of up to 20000 hz. The resolution might need to be at least 0.01m and time step of 20000hz minimum if not more. It could easily reach 10^12 compute just for one second of speech synthesis, and that’s quite an optimistic estimate.
If all you’re trying to achieve is “just” speech synthesis then you could look for a Julia package that implements it here. A full-up acoustic physics engine would be way overkill for that in most cases. Frankly, there’s a hell of a lot you could do to mimic more complicated acoustic effects with simple DSP filters and convolutions, if you’re so inclined.
Yes, that’s a good advice. Unfortunately, I’m not talking a regular speech synthesis. I’m talking about making the best song possible synthetically. If there is a task where I could improve my music quality just by throwing PC-level compute power at it, it’s a cheap win for me.
That being said, many of my ideas are in planning phases and I rarely get to execute them, and my execution often get sloppy, if at all.