How to improve runtime with measurements.jl?

Cevheriferd · August 30, 2021, 11:56am

Dear all,

I am using the measurements.jl package to calculate the error in my dataset. I have an array with voltage values which all have their own uncertainty. But when I want to calculate the mean of the array, it takes 27 seconds and allocates 805.25 MiB to do this. The array consists of around 47000 elements all of the same type. Does anyone know how to improve the performance of this operation? For an array of the same size with only Float64 elements the operation is almost done instantly. Thanks for helping me out.

Sukera · August 30, 2021, 12:01pm

Welcome! It’s hard to say what’s causing allocations or slowing down your code if you don’t share it with us - please share a MWE (minimal working example) showing the problem.

Please read: make it easier to help you

Also, have you followed most of the tips mentioned in the Performance Tips section of the manual?

Cevheriferd · August 30, 2021, 12:18pm

```julia

using Statistics, Measurements

VoltageArray = rand(50000) #Works really fast

VoltageArray = VoltageArray .± 0.1VoltageArray #Works really fast

mean(VoltageArray) #Takes a long time to run

This is a MWE of my problem. I hope I made it clear

Sukera · August 30, 2021, 1:17pm

Divisions are expensive:

julia> @time (1 ± 0.1) / (2 ± .2)            
  0.000007 seconds (4 allocations: 192 bytes)
0.5 ± 0.071

According to this issue on their repo this is known, you may be looking for weightedmean instead here:

julia> v = rand(50_000);                       
                                               
julia> v = v .± 0.1 .* v;                      
                                               
julia> @time weightedmean(v)                   
  0.001619 seconds (6 allocations: 781.438 KiB)
0.0002565 ± 2.2e-6

Though you should note:

NOTA BENE: correlation is not taken into account.

Accumulation of error bars is non-trivial, else this would already be much faster, I assume.

DNF · August 30, 2021, 1:21pm

Sure, divisions, but look at this:

jl> v = rand(100);

jl> vm = v .± 0.1 .* v;

jl> @btime sum($v)
  9.409 ns (0 allocations: 0 bytes)
54.45849425120919

jl> @btime sum($vm)
  14.700 μs (102 allocations: 6.44 KiB)
54.46 ± 0.62

That’s a 1500x slowdown(!) Is that really expected?

Sukera · August 30, 2021, 1:28pm

Not my words

github.com/JuliaPhysics/Measurements.jl

Arithmetic operations very slow

opened 09:05AM - 14 Oct 18 UTC

aplavin

enhancement help wanted performance

It looks like this package has a lot of overhead for simple math operations - ab…out 50 times slower than normal floats and uses 20 times more memory! Not even talking about number of allocations. ```julia using BenchmarkTools using Measurements a = randn(1000, 1000) b = randn(1000, 1000) @btime a ./ b; # 1.791 ms (4 allocations: 7.63 MiB) a, b = a .± b, b .± a @btime a ./ b; # 88.498 ms (4000004 allocations: 175.48 MiB) ``` For comparison, an extremely basic implementation from [rosettacode](https://rosettacode.org/wiki/Numeric_error_propagation) has only 4x overhead, which is completely reasonable: ```julia using Main.NumericError a = randn(1000, 1000) b = randn(1000, 1000) a, b = Measure.(a, b), Measure.(b, a) @btime a ./ b; # 6.797 ms (4 allocations: 15.26 MiB) ``` Any chance that performance of Measurements.jl can be improved?

Maybe something has changed since then, @giordano could chime in, but I doubt anything grand has changed here. Keeping track of correlations is hard.

giordano · August 30, 2021, 2:13pm

Yes, unfortunately that’s exepcted, I made the example of the mean in the issue linked above. As @Sukera pointed out, tracking correlation is hard. It’s pretty easy to write a package to propagate uncertainties super quickly ignoring correlations, this is what Measurements.jl did until v0.02, but that’s also incredibly dumb and useless: almost no identies would hold, for example x + x and 2 * x would give you different results.

Regarding the mean in particular, note that most of the time users want to compute the weighted mean, for which Measurements.jl provides a specific function, which should have much more reasonable performance. Note that it ignores correlation, as warned in the docstring, because you’d usually apply to a sample of independent measurements anyways.

To be clear, Measurements.jl is slow because of an algorithmic limitation: it uses an O(n^2) algorithm to propagate uncertainties, you can understand why mean/sum are particularly bad, and get worse and worse as the size of the vector increases. There may be clever ways to reduce the complexity of the algorithm, but I never had the time to look at it. I believe the Python package uncertainties now has an algorithm which is O(n), or anyways better than O(n^2), but until a few years ago it was using basically the same algorithm as Measurements.jl (well, historically it’s the other way around). If anyone is willing to help, I’d be glad to hear from them.

DNF · August 30, 2021, 2:20pm

Okay, I had no idea, and thought is was the same as interval arithmetic.

giordano · August 30, 2021, 2:26pm

No, I believe interval arithmetic doesn’t track correlations at all, which is probably fine for what it’s applied to

Sukera · August 30, 2021, 3:36pm

Bummer, that project has a weird custom license It may be a BSD license according to git history, but I’m not a copyright lawyer…

gbaraldi · August 30, 2021, 4:06pm

I think it is a 3 clause BSD license The 3-Clause BSD License | Open Source Initiative. What that means for a port I don’t know, but it’s that one,

Sukera · August 30, 2021, 4:35pm

Then we may be in luck - that license only places restrictions on redistributions in binary or source form (a port is as far as I know neither), as well as not endorsing our port with their name (i.e. we should be fine if we write something like “port inspired by”…).

Topic		Replies	Views
Usage of Measurements.jl Data question , uncertainty-quantifi	3	734	May 12, 2023
ANN: Measurements.jl v0.4.0 Community announcement , measurements	1	1260	July 10, 2017
Measurements.jl: how to zero values without eliminating the uncertainty / strange behavior with deepcopy General Usage physics , measurements	22	1458	September 21, 2021
Why `mean` with `dims` argument is so slow? Performance question	4	573	June 10, 2020
How should I account for measurement resolution (particularly in Measurements.jl)? General Usage physics , measurements , uncertainty-quantifi	0	339	January 24, 2023

How to improve runtime with measurements.jl?

Related topics