Thanks!

Note that you probably won’t be able to use `dot_oro`

immediately, since (if I understand your code correctly), you want to add matrices together rather than scalars, so that the elements you want to sum are not nicely stored together in a vector.

As @stevengj mentioned, you might want to use `Xsum`

in this case, since AFAIK it works with any iterable collection. So that would probably require very few modifications to your current code.

Another technique could be to perform a parallel reduction like @oxinabox suggested above. A MWE of a compensated+distributed reduction re-using `AccurateArithmetic.jl`

's internals could look like:

```
using Distributed
Distributed.addprocs(8)
@everywhere begin
using AccurateArithmetic: two_sum
# This type will be used to implement compensated summation
struct Accumulator
high :: Float64 # First-order term in the result
low :: Float64 # Error term
end
# Various methods needed for the parallel reduction
function add(a::Number, b::Number)
hi, lo = two_sum(a, b)
Accumulator(hi, lo)
end
function add(a::Accumulator, b::Number)
hi, lo = two_sum(a.high, b)
Accumulator(hi, a.low+lo)
end
function add(a::Accumulator, b::Accumulator)
hi, lo = two_sum(a.high, b.high)
Accumulator(hi, a.low + b.low + lo)
end
add(a::AbstractArray, b::AbstractArray) = add.(a, b)
# In the end, reduce an `Accumulator` to a single `Float64`
# by summing both components
reduce(a::Accumulator) = a.high + a.low
end
```

With that in place, the computation itself could be as simple as

```
# Parallel reduction using the custom `add` function
julia> acc = @distributed add for i in 1:100
ones(3,3) * sin(0.02pi*i)
end
3×3 Matrix{Accumulator}:
Accumulator(-4.88498e-15, 5.63682e-15) Accumulator(-4.88498e-15, 5.63682e-15) Accumulator(-4.88498e-15, 5.63682e-15)
Accumulator(-4.88498e-15, 5.63682e-15) Accumulator(-4.88498e-15, 5.63682e-15) Accumulator(-4.88498e-15, 5.63682e-15)
Accumulator(-4.88498e-15, 5.63682e-15) Accumulator(-4.88498e-15, 5.63682e-15) Accumulator(-4.88498e-15, 5.63682e-15)
# `acc` computed above is an array of `Accumulator`s
# -> reduce it to an array of `Float64`
julia> reduce.(acc)
3×3 Matrix{Float64}:
7.51836e-16 7.51836e-16 7.51836e-16
7.51836e-16 7.51836e-16 7.51836e-16
7.51836e-16 7.51836e-16 7.51836e-16
```

Note that for this to work with Julia v1.6, you’ll probably have to use a version of AccurateArithmetic that incorporates PR#24 by @Elrod (which I haven’t merged yet; sorry about that!):

```
using Pkg
pkg"add AccurateArithmetic#a39fd8696e"
```

But again, and at the risk of being too heavy handed here, I would like to second @stevengj’s advice above. I obviously do find arbitrary/increased precision useful enough to work on it myself. However, please consider whether the ill-conditioning of your problem really comes from the (physical?) system you’re trying to simulate, or whether it comes from one of the mathematical modeling steps which lead to the solution algorithm that you’re implementing. If you’re in the latter case, it does not mean that you or your colleagues did a bad job devising an algorithm that suits your need. But you might want to see this as an indication that it might be rewarding to put additional work in the mathematical modeling and/or algorithm, rather than focusing on the accurate implementation of the algorithm.

In my experience, any improvement made in the early stages of the whole simulation process (i.e. in the mathematical modelling or algorithm stages) is more likely to have more beneficial impact than a change in the implementation itself.

That being said, I’m fully aware that investigating such things takes time. Time that you might want to devote to other things (and rightfully so).