Yes. Now I looked into what CUDA.jl was doing, and it seems they actually call NVIDIA’s LLVM bitcode library called libdevice. They do this for all of sincos, sin and cos. Libdevice does it’s own range reduction and everything else, so most of what I said doesn’t apply.
This however doesn’t explain the “references to double in the IR that aren’t there when using sin and cos”, that’s weird and interesting.
NB: this is the Base.sincos(::Float32) code for CUDA.jl: