Hi, i have two arrays of float64: arr_mean
and arr_stds
.
i can easily round arr_stds to 1 or 2 significative digits with round.(arr_stds, sigdigits=2)
. My problem is how to round arr_mean
so that each elements after the decimal have digits only up to the order of the most significative digits of the correspondent element in arr_stds
, if instead the std is bigger than zero the mean should be rounded to that order of magnitude. basically i would like to round the mean so that it respect the rules of significant figures.
IIRC, this is backwards for significant figures rules. The significant figures of an output depends on the operations and the inputs, so the significant figures of the mean depends on the data, and the significant figures of the standard deviation depends on the mean and the data. You’ll need an algorithm to sift through those for that.
As for rounding to a specific place after the decimal, you have a known number of significant digits following the most significant digit. log10
gets you close to the place of the most significant digit, and I think these were the right adjustments:
julia> decimalplace(x::Real)::Int = if iszero(x) 1 elseif abs(x)>=1 floor(log10(abs(x))+1) else floor(log10(abs(x))) end
decimalplace (generic function with 1 method)
julia> decimalplace.([999 100 9 1 0 0.1 0.999 0.01 0.0999])
1Ă—9 Matrix{Int64}:
3 3 1 1 1 -1 -1 -2 -2
Then given s
significant figures you subtract s-1
from that to reach the least significant digit.
those are outputs of a neural network that works in float64 even though the data have a much smaller ( in the sense of less) precision. i don’t think it th n-th decimal number after the first 1-2 significant digits carry any information. they are just the product of random fluctuations.
If you don’t know the exact operations between your inputs and the outputs, then you can’t even attempt to apply significant figures rules, which is only a specific approach to estimating precision. If quantities had known limits for example, then you would instead do interval arithmetic; IIRC Measurements.jl does something similar but around error bars. Figure out the precision estimation other people use for your particular context; in fact, the output giving mean
and std
strongly hints at an interval, so rounding no longer serves the purpose of loosely estimating error about a point.
i don’t completely get what you mean but i don’t see the reason to report as a result something like 12.12124314235453455 ± 0.235643452342342344523 just because the pc estimates mean and std in floating points.
the most natural thing would be to report 12.1 ± 0.2.
The difficulty I have with your question is this: how do you define the “least significant digit” of arr_stds
?
For example, what if arr_std[1]
is supposed to be 0.02
, but because this value is not exactly representable as Float64
you actually have 0.0200000000000000004163336342344337026588618755340576171875
. Or what if arr_std
comes from some other inexact computation or inexact data?
Maybe you instead mean that you want to round the elements of arr_mean
up to the number of digits given by the most significant digit of the corresponding elements of arr_std
. i.e. if arr_std[i] = 0.0123423
, then you want to round arr_mean[i]
to two decimal places. This is easily accomplished by e.g.
round2(x, xerr) = round(x, digits=max(0, ceil(Int, -log10(xerr))))
which gives
julia> round2.([123.4567, 123.4567, 123.4567], [0.1234, 0.0735, 0.00315])
3-element Vector{Float64}:
123.5
123.46
123.457
yes the most! why did i write least ?damn.
If I understand correctly, your issue isn’t so much that your numbers have too much precision but rather that they print with more precision than you care about. I would suggest you simply print the numbers with less precision.
The Printf
standard library is one available tool for this:
julia> using Printf
julia> @printf(stdout, "%.4g ± %.4g", 31/3, 1/6) # print with 4 significant digits
10.33 ± 0.1667
You can print to a file or some place other than stdout
if you like, or use @sprintf
to produce a string.
There are options for making the printed precision a dynamic variable, but I’ll leave that more advanced usage to documentation (and you might want to look online for more printf
documentation since the Julia docs are a bit thin – it’s the same function across many languages).
If I understand correctly, your issue isn’t so much that your numbers have too much precision but rather that they print with more precision than you care about.
no. it’s the neural network that return numbers with a fake higher precision than what he can possibile achieve due to the precision of the training data and the training procedure. you can see this effect with a simple bayesian neural network that predict mean and variance, you will see that if you run .predict() multiple times the numbers that follow the most significative digit in the variance change randomly,suggesting that they do not carry any useful information.
The
Printf
standard library is one available tool for this:
thanks but i think the solution given by @stevengj is easier although still incomplete.
But rounding them doesn’t make them better or more useful. Instead of using the values you trained to, you’re adding a (hopefully negligible, but possibly damaging) perturbation so that they happen to print as short numbers in base 10. So why bother? And what’s so special about rounding in base 10 anyway?
By analogy: maybe 3.14 is a good enough representation of \pi for your application, but in that case 3.14159 is probably also fine. And both are equally easy for the computer to work with.
This is not about making them better or easier for the computer to handle, but about avoiding reporting in the text or to the user a level of precision that is not true.
if in 100 runs I get something like 3.14xxxxxxxxxx with x changing with each run while .14 remains constant, then it makes sense to report that the neural network estimates 3.14 ± 0.01 instead of choosing one result from the entire 100 and returning 3.142352 ± 0.000001.
the second option makes the user believe that the network is reliable up to the sixth digit after zero , which is false.
if i had initialized the network with something bigger, like float128, would i have to report an even bigger precision?
Neural networks are stochastic, the results are supposed to vary randomly from run to run. Truncating to the “constant” digits not only throws away that variation (in the xxxx digits), you’re asserting a value below anything in the distribution and misapplying significant figures via ± 0.01 to assert a distribution that does not exist. You are at least likely correct that the ± 0.000001 does not represent variance across runs, so you’ll need to do your own statistics.
You’re still mixing up formatting, precision of the data type, and precision of the data. Yes, numbers are truncated or rounded to fewer digits for human eyes, especially in lists and tables. But that’s entirely for convenience, it often has nothing to do with true precision, and people know that graphics are no substitute for the data files.
Precision of the data is something you need to infer from context; you can’t see a number of digits and simply decide that’s the true precision. 5.16 could be the exact number 5.16. If that was measured from a meter stick with centimeter marks, then I can reasonably say there are 3 significant figures. If it came from a scale with a ±0.001g tolerance, then I should compute with that interval.
Switching to a data type with higher precision does not introduce false precision of the data, in fact it’s often the opposite. Say I somehow have a 2-bit unsigned integer type to just about represent the integers 1 and 3 exactly, thus I can represent the fraction 1/3 exactly with UInt2(1)//UInt2(3)
. But we all know we can’t represent it exactly in binary or decimal. 1 and 3 both only had 1 digit, so if we misapply significant figures rules again, we round to 0.3, a much worse error than if we used a higher precision data type like Float64
(a whole 53-4=49 more bits of precision) to represent 0.3333333333333333. There are algorithms where inputs are converted to intermediate data types of higher precision in order to reduce such error, possibly to 0, before converting back. So, messing with the output’s data type or value may actually be a bad thing; after all, the network didn’t do such a simple task for you. Find out what other people do in your context.
you’re asserting a value below anything in the distribution and misapplying significant figures via ± 0.01 to assert a distribution that does not exist.
have you ever used a bayesian neural network?
it predicts mean and a variance of (usually) p(y|x) approximated as a gaussian distribution. due to the stochastic nature of the network there are small oscillations in the predictions but if the network is trained well it often converge to a number that remains more or less constant. I do not have to infer the precision, the network does: if i add more data the estimate of the variance becomes smaller, if i remove them it becomes larger.
if i switch to an higher precision data type i simply raise the quantity of numbers that are printed after those that remain constant.
Then you do understand enough to know that it’s pointless for you to misapply significant figures, an approach to estimating precision.
And it should be apparent why 3.14 ± 0.01, in other words the real interval [3.13, 3.15], represents such a converging value poorly.
round
isn’t designed to change the data type from input to output, yet you’re already using it to change values to those formatted with fewer digits. Similarly, I can trivially use round
on a higher precision data type and print fewer digits: let x = Float16(1)/3; println(x, "\n", round(Float64(x); digits=2)) end
. Again, formatting, data type precision, and the data’s true precision are completely unrelated to each other and likely unrelated to what you really want.
You have more or less the basics of how to round a value to any place based on another value, and you’re free to apply that however you’d like. If someone else reads your report and raises the same questions, you’ll deal with it then.