In 1D the bandwidth parameter controls the “width” of the kernel. But in 2D not only is there a width in each dimension, but also there’s a covariance structure to the kernel. If you can adapt the covariance structure to the data, your kernel can do a better job. But in general that’s a hard problem, and the covariance can change from place to place. Imagine a distribution that looks like a banana. In one place in x,y space the data stretches out in a vertical direction, as you move around the banana it may stretch out in a diagonal direction, and then later along the banana it may be horizontal… In N dimensions this obviously just gets worse and worse.
But for 2D with sufficient data, you can get a smooth KDE with a bivariate independent kernel that works well enough for many purposes. Give it a try. Better yet, come back and give us a plot of what you got and how well it worked!