This is correct. To capture information about GPU allocations, check out the tools mentioned in Benchmarking & profiling · CUDA.jl.
Maybe, maybe not. Equally if not more important than the total amount of memory allocated could be the maximum memory the model uses at any given point in time.
There’s no central, definitive source, but plenty of previous discussions to search through with the obvious keywords. Just in the past little while on Discourse, I see Unreasonable memory usage with M4 GPU and Memory usage increasing with each epoch - #14 by JoshuaBillson. While on GitHub, cuda gpu memory usage increasing in time · Issue #2523 · FluxML/Flux.jl · GitHub covers much the same. There have been similar discussions on Slack.
Side note, are you aware of the existence of Machine Learning - Julia Programming Language ? Posts that are in too general a category can get lost, because some people only check specific categories frequently.