Actually, it is possible just to use the intrinsic even for `_mm_add_sd`

(it is just that clang doesnt use it).

```
const VE{N, T} = NTuple{N, VecElement{T}}
@generated function _mm_add_sd(a::VE{2,Float64}, b::VE{2, Float64})
exp = """
%3 = call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %0, <2 x double> %1)
ret <2 x double> %3
"""
return quote
Base.llvmcall(
("""
declare <2 x double> @llvm.x86.sse2.add.sd(<2 x double>, <2 x double>)
""",
$exp),
VE{2,Float64},
Tuple{VE{2,Float64}, VE{2, Float64}},
a, b)
end
end
```

```
julia> a = VE{2, Float64}((1.0,2.0))
(VecElement{Float64}(1.0), VecElement{Float64}(2.0))
julia> b = VE{2, Float64}((3.0,4.0))
(VecElement{Float64}(3.0), VecElement{Float64}(4.0))
julia> _mm_add_sd(a, b)
(VecElement{Float64}(4.0), VecElement{Float64}(2.0))
julia> @code_native _mm_add_sd(b, a)
.section __TEXT,__text,regular,pure_instructions
; Function _mm_add_sd {
; Location: REPL[3]:2
; Function macro expansion; {
; Location: REPL[3]:2
vaddsd %xmm1, %xmm0, %xmm0
retq
nopw %cs:(%rax,%rax)
;}}
```