Physical units in Julia

In fact I would love to do unitful calculations everywhere for the very reasons you explained, and I even did so at least once where it was possible. However in my context, sparse linear algebra and AD are two building blocks which in the moment seem out of the realm of unitful.

As for the gradient descent etc.: This indeed makes me thinking …
It seems that in fact before we hand over data to such an algorithm we kind of always implicitely strip them off their units by passing ratios wrt. reference quantities. Very much like we work with the argument of the exponential in an Arrhenius law: k=Ae^{-\frac{E_a}{RT}} - here, RT is the reference value. The resulting weighting then is of course an algorithmic parameter which in fact can have quite an influence on convergence etc… When we use ustrip(upreferred(...)), we implicitely chose the SI basis units. However this is one convenient (or lazy?) choice. Sadly, up to now I never took the time to deeper explore nondimensionalization - with UnitfulBuckinghamPi.jl this may even be fun…

1 Like

I have been following this discussion with some interest. I have not delved into the inner workings of UnitFull.jl however. I like the idea of units and use them fairly consistently in MathCad, where it is very convenient. Another program that was units aware had an interesting implementation that is worth sharing. Essentially it broke out the units into the NIST accepted fundamental units. There are 7 fundamental units, and if you include the unitless angle measurement there are 8. An example table detailing this is shown below. It will look much better imported into a spreadsheet with using a space as a delimiter. It may well be that UnitFull.jl uses this under the hood.

As you can see, both torque and energy share the same fundamental units. This sort of thing makes using units more of a challenge.

Another aside, one of my professors taught a course where he liked to talk in terms of effort and flow. Regardless of which physical system you were dealing with work is always effort and flow. For instance force * velocity, torque * rotational speed, temperature * entropy, Volts * Current, pressure * flow rate, etc. This is what made analog computers possible. I find this way of thinking valuable particularly when dealing with a regime where there is less of a gut feel for what is going on.

When you think in these terms, the NIST fundamental units are a bit strange. Current is the only fundamental unit defined as a flow!

Label	Scale	Mass	Leng	Time	Amp	Temp	Lumi	Mol	Rad
V	1	1	2	-3	-1				
N	1	1	1	-2					
lbf	4.448221615	1	1	-2					
kgf	9.80665	1	1	-2					
g	9.80665	0	1	-2					
m/s^2	1	0	1	-2					
in/s^2	0.0254	0	1	-2					
m/s	1	0	1	-1					
in/s	0.0254	0	1	-1					
m	1	0	1						
mm	0.001	0	1						
in	0.0254	0	1						
mil	0.0000254	0	1						
ft	0.3048	0	1						
s	1	0	0	1					
min	60	0	0	1					
h	3600	0	0	1					
Pa	1	1	-1	-2					
psi	6894.757293	1	-1	-2					
kgf/m^2	9.80665	1	-1	-2					
SPL	0.00002	1	-1	-2					
rad/s	1	0	0	-1	0	0	0	0	1
Hz	6.283185307	0	0	-1	0	0	0	0	1
RPM	0.104719755	0	0	-1	0	0	0	0	1
CPM	0.104719755	0	0	-1	0	0	0	0	1
Ord	1	0	0	0	0				
rad 	1	0	0	0	0	0	0	0	1
Deg	0.017453293	0	0	0	0	0	0	0	1
SPL	0.00002	1	-1	-2					
A	1	0	0	0	1				
Ohm	1	1	2	-3	-2				
pC	1E-12	0	0	1	1				
degF	1.8	0	0	0	0	1	0	0	0
degC	1	0	0	0	0	1	0	0	0
BTU	1055.1	1	2	-2					
J	1	1	2	-2					
Nm	1	1	2	-2					
W	1	1	2	-3					
N/(m/s)_impedance	1	1	0	-1					
m/s/N_mobility	1	-1	0	1					
m/s^2/N_receptance	1	-1							
W/m^2_soundInt	1	1	0	-3					
S_entropy	1	1	2	-2	0	-1			

1 Like

Well:

julia> using Unitful
julia> for unit in [u"V", u"N", u"C", u"F", u"W"]
       println(unit,": ", upreferred(unit), "   ", dimension(unit))
       end
V: kg m^2 A^-1 s^-3   𝐋^2 𝐌 𝐈^-1 𝐓^-3
N: kg m s^-2   𝐋 𝐌 𝐓^-2
C: A s   𝐈 𝐓
F: A^2 s^4 kg^-1 m^-2   𝐈^2 𝐓^4 𝐋^-2 𝐌^-1
W: kg m^2 s^-3   𝐋^2 𝐌 𝐓^-3

Unitful has been built around this. And I consider the system more SI (Système international) than NIST.

3 Likes

Just to add another data point in the discussion.
I normally work with thermodynamic quantities (volume,temperature,pressure,enthalpy,entropy,internal energy) and was in a need of more specialized unit conversions (molar volume to mass density, for example). to do that, i developed https://github.com/longemen3000/ThermoState.jl . The main object is essentially just a tuple of units that can be dispatched on. the final result is unit stripped via ustrip(upreferred(val)), because it is supposed to be a layer to perform all necesary Uniful validations and transformations before passing the value to an inner algorithm (an equation of state, for example). there were two main problems i found while developing the package:

  • Enthalpy, Internal energy, Gibbs energy, Helmholtz energy, are all different values, yet they have the same units
  • the molar and mass fractions being unitless

Those two issues are handled by adding an additional dispatch layer, but i would like to see a more Unitful-pure way to do this

2 Likes

When you start thinking about this carefully, it’s a real eye-opening experience. Did you know that gradient descent is utter nonsense?

Agree that it’s fascinating when you start thinking about it carefully. I’ll add a more technically correct statement for the point you are making: gradient descent depends on the choice of an inner product.

The intuition is as follows: In a gradient descent situation, suppose that

\begin{eqnarray} \mathrm{loss} &=& 6 \\ \partial_x \mathrm{loss} &=& -3 \\ \partial_y \mathrm{loss} &=& -2 \end{eqnarray}

Now you’re wondering whether you should make 2 steps in the x direction, 3 steps in the y direction, or some combination of the two.

An inner product is what allows you to compare distances along different coordinate axes, so you can choose the combination that makes (the linear approximation to) the loss function vanish with the shortest step.

I’ll illustrate that with an example: if x and y are orthogonal and x is measured in meters but y is measured in feet, then your inner product would look like

\langle a, b \rangle = a_x b_x + (0.3048)^2 a_y b_y

The gradient, \nabla \mathrm{loss}, is defined as the unique vector such that

\langle \nabla \mathrm{loss}, b \rangle = \partial_x \mathrm{loss} \cdot b_x + \partial_y \mathrm{loss} \cdot b_y

for every vector b. So the only solution is

\begin{eqnarray} \nabla\mathrm{loss}_x &=& -3 \\ \nabla\mathrm{loss}_y &=& -2(0.3048)^{-2} \end{eqnarray}

That’s exactly what you would get if you first convert y into meters, compute the gradient “naively”, and then convert \nabla\mathrm{loss}_y back to feet.

Btw, in this example with meters and feet, at least the type of unit is the same. In typical machine learning applications, you might be mixing features with different types of unit that you can’t naturally add at all. In that case, you have to tell the computer how to e.g. compare “a step of x EUR in price” and “a step of y minutes of driving time”. It’s just a hyperparameter to tune. I think in practice people achieve this effect by pre-scaling the input. That’s also what @j-fu says:

It seems that in fact before we hand over data to such an algorithm we kind of always implicitely strip them off their units by passing ratios wrt. reference quantities.

1 Like

Things get even more interesting with AffineUnits. Consider the following Hamiltonian describing electrons in graphene:

H = \left(\begin{array}{cc} 0 & v_0 (p_x - i p_y)\\ v_0 (p_x + i p_y) & 0 \end{array}\right).

Here energy and momentum are defined with respect to the Dirac point, so we could define affine units like “eV from Dirac point” and “kg*m/s from Dirac point”.

Then what units should the matrix elements of H have if we want the eigenvalues to be in eV from Dirac point? One could expect that all elements of H should be in eV from Dirac point, but it’s actually true for the diagonal elements only, because they represent the average energy of some state (with wavefunction \left(\begin{array}{c} 1\\ 0 \end{array}\right) or \left(\begin{array}{c} 0\\ 1 \end{array}\right)).

Non-diagonal elements don’t correspond to the average energy of any state (they aren’t even real), but they can be expressed in terms of energy differences and therefore are measured in “plain eV” without any offset:

H_{12} = \frac{1}{2} (\langle \psi_1 | H| \psi_1 \rangle - \langle \psi_2 | H| \psi_2 \rangle) - \frac{i}{2} (\langle \psi_3 | H| \psi_3 \rangle - \langle \psi_4 | H| \psi_4 \rangle),

where

| \psi_1 \rangle = \frac{1}{\sqrt{2}} \left(\begin{array}{c} 1\\ 1 \end{array}\right), | \psi_2 \rangle = \frac{1}{\sqrt{2}} \left(\begin{array}{c} 1\\ - 1 \end{array}\right), | \psi_3 \rangle = \frac{1}{\sqrt{2}} \left(\begin{array}{c} 1\\ i \end{array}\right), | \psi_4 \rangle = \frac{1}{\sqrt{2}} \left(\begin{array}{c} 1\\ - i \end{array}\right).

v_0 is the group velocity of electrons in graphene, so it should be measured in “eV*s/(kg*m)” (or m/s, if we measured energy in joules), because group velocity is \frac{dE}{dp}. But p_x \pm i p_y is in kg*m/s from Dirac point, so we need to subtract the Dirac point momentum (0 kg*m/s from Dirac point, obviously) before multiplying by v_0.

Because of the different dimensions of diagonal and non-diagonal elements we can’t square the Hamiltonian, but taking a commutator of two Hamiltonians, or a Hamiltonian and another operator, is permissible (offsets commute with everything and cancel in the commutator).

If the energy of a single electron is measured in eV from Dirac point, then in what units should we measure the total energy of all electrons in graphene? The offset will depend on the number of electrons, so it will actually be a quantum mechanical operator, which could be represented as an (infinite) matrix in the basis of Fock states. So now we get AffineUnits with a matrix offset :crazy_face:.

…etc.

I would put it slighty differently: what you really want is steepest descent, not gradient descent, and in order to define a “steepest” direction you need a norm (a measure of distance for infinitesimal perturbations). If you have a vector x where the components have different units, then you must be using some kind of weighted/rescaled norm, not the Euclidean norm, and in this case the steepest-descent direction is not the direction of the gradient.

This way you can keep the familiar definition of the gradient, and the rescaling instead happens in the definition of steepest descent.

Consider a scalar-valued function f(x) of a vector x. We can define the gradient by f(x+h) ≈ f(x) + \nabla f^T h to “first-order” in h (which implicitly requires a norm on h, but the choice of norm is irrelevant). That formula still works, with the familiar definition of \nabla f and the Euclidean dot product \nabla f^T h, even if the components of x have different units!

But the steepest-descent direction is the h that minimizes \nabla f^T h for a fixed \Vert h \Vert, and that requires us to choose a norm on h explicitly. For example, if you use a weighted Euclidean norm \Vert h \Vert^2 = h^T W h for weights W (e.g. a diagonal matrix that divides each component by an appropriate unit²), then the steepest-descent direction is proportional to -W^{-1} \nabla f, and a steepest-descent step x - \alpha W^{-1} \nabla f now makes dimensionful sense for a scalar \alpha.

Yet another viewpoint is that what matters for many optimization algorithms is not the definition of “steepest”, but rather the trust region in which they minimize f(x+h) \approx f(x) + \nabla f^T h or similar local approximations for f(x). Some algorithms use a spherical trust region with an adaptive radius, or similarly a fixed stepsize in the gradient direction, which is implicitly a Euclidean norm. Some other algorithms like CCSA use a “box” trust region in which the trust size in every direction is chosen independently with different “units” – in this case, it can adaptively find a “good” rescaled metric for your problem. They will all still converge (eventually) to a local minimum (assuming a convergent algorithm!) even if you choose a weird scaling of your variables, but choosing a scaling so that the Hessian is well-conditioned will accelerate convergence.

7 Likes