Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System of units #783

Open
klenze opened this issue Feb 4, 2023 · 9 comments
Open

System of units #783

klenze opened this issue Feb 4, 2023 · 9 comments

Comments

@klenze
Copy link
Contributor

klenze commented Feb 4, 2023

ROOT says:

Selecting the System of Units in ROOT
Historically the system of units in ROOT was based on the three basic units centimeters, seconds and GigaElectronVolts. For the LHC era in Geant4 collaboration decided that a basic unit system based on millimeters, nanoseconds and MegaElectronVolts was better suited for the LHC experiments. All LHC experiments use Geant4 and effectively adopted this convention for all areas of data processing: simulation, reconstruction and data analysis.

I think following the lead of the LHC experiments here and adopt the Geant4 units. This has the additional advantage that we can use Geant4 unit prefixes.

I think writing

  • task->SetThreshold(20*MeV)
  • task->SetMaxTheta(10*degree)
  • task->SetRadius(20*cm)
  • hist->Fill(hit->GetEnergy() / keV )
    is very verbose.

(Of course, it what would be even better would be dimensionality checking at compile time (so that 20*cm+1*MeV or hist->Fill(hit->GetEnergy()) will not even compile, but this would require more work.

@klenze
Copy link
Contributor Author

klenze commented Feb 5, 2023

Of course, it what would be even better would be dimensionality checking at compile time

See here for my proof of concept implementation, or here for a version which is much more likely to end up in the C++ standard :-)

@klenze
Copy link
Contributor Author

klenze commented Feb 6, 2023

Also consider CALIFA: experimental calibrations always seem to be keV, while simulations are GeV.

@inkdot7
Copy link
Contributor

inkdot7 commented Feb 7, 2023

My 25 cents (value+unit) would be to use the Geant4 style of doing units. It is not too hard to work with, and not very verbose (only if one would start to spell things out, like kiloelectronvolt instead of keV).

It also provides small inline comments about the unit, without comment delimiters. And since quantities do have both a value and a unit (unless dimensionless), bare-word 20 is actually incomplete. 20*MeV reads better. (20 MeV of course reads even better...)

land02 uses some compile-time type-system for units (very crude though). That turned out to be very clumsy, and a real hindrance all over the place. Based on that experience, I'd recommend to go for the not-perfect (i.e. no static compile-time checking) Geant4-style, which still should get things much closer to easily checked code than having no unit specifiers at all. At least it is a starting point, much better than bare-word unitless values, and should be hard to argue against introducing.

Another nice thing with the Geant4 style is that it does not really hard-code the basic units used internally. E.g. 1 mm could be changed to 1 m without having to change anything but the definition. All other code just needs a recompile. It also does not force a particular base in print-out or plotting. Since one anyhow has to (or at least should!) do / keV or / MeV, then even things like / (10 * MeV) work and the user-visible units are decoupled from the internal representation.

And the Geant4-style should work rather well in a C-code as well, given a suitable header.

@hapol
Copy link
Contributor

hapol commented Feb 7, 2023

Geant4-style units are cool for me. It would be better to have them in the framework (with external maintenance) instead in a flavour (our r3b flavour). Any volunteer to perform the modification? It would require the revision of several thousands of code lines to adapt, in many classes ...

@inkdot7
Copy link
Contributor

inkdot7 commented Feb 7, 2023

It can hopefully be done in pieces - as long as thing stay consistent. One problem with not having the compiler nag about things, is that the chance for mistakes is not much reduced. And even increased during the transition, which may take a long time.

One way to at least hint someone reading code would be to use some typedefs:

typedef Double_L_t Double_t;  // A length.
typedef Double_T_t Double_t;  // A time.
typedef Double_E_t Double_t;  // An energy.
typedef Double_M_t Double_t;  // A mass.
typedef Double_angle_t Double_t;  // An angle.
// Etc...

This would not catch any errors, but at least act as a note to a user that the variable is to be handled with the CLHEP units.

A way to try to make consistent changes is to do just start with one or a few variables at a time, and then also whatever things interact with it. Interactions can be found by temporarily (i.e. during edit+compile cycle, but not committing) changing names of the affected variable and function, e.g. Double_t fDeltaAzimuthal; becomes Double_t fDeltaAzimuthalxx. Then the compiler would at least help to find the uses. And then change the names back before committing. That is a awful lot of changes, but probably worth the effort. Will not help with code which is not compiled, like macros. Unless they are all tested as well... If the CI tests would do some .L macro.C+ of all macro files, then perhaps such things would be managable as well.

@YanzhaoW
Copy link
Contributor

YanzhaoW commented Feb 9, 2023

One way to at least hint someone reading code would be to use some typedefs

There is user defined suffix operator in C++ (check this). And we can write values like: 1_m, 2_ns, 3_MeV

@klenze
Copy link
Contributor Author

klenze commented Feb 10, 2023

My 25 cents (value+unit) would be to use the Geant4 style of doing units. It is not too hard to work with, and not very verbose (only if one would start to spell things out, like kiloelectronvolt instead of keV).

At least SI units do not suffer that much from inflation :-)

I meant "verbose" (at the level in my original comment) a good thing, actually. I agree that having to spell out keV would be overly verbose.

I think it would be helpful to distinguish enforcing dimensional analysis within classes/functions and at interfaces.

Within a class/function, the spatial, temporal and interpersonal difference between the setting a double in some unit and using it is typically small:

void f()
{
    double x=42.5; // length in mm
    ...
    (use of x)
   ...
};

Also, there are all sorts of ways to calculate the wrong result which would not be detected by dimensional analysis, so the function has to be manually checked anyhow. Forcing devs to convince the compiler that all the units match will waste a lot of time for little benefit.

On the other hand, the distance between the caller of a method and the method can easily span multiple software projects and decades. The person writing the call might not even have met the author of the called code. Enforcing units in such a context seems like a sensible precaution.

To that end, I would propose having two namespaces. strong_unit::length would be the type of mm, parsec and so on. The other one would work like this:

namespace weak_unit
{
template<<strong_unit::length base=mm>
using length=...;
};

strong_unit types would be implicitly convertible to corresponding weak_unit types (with the value getting automatically adjusted by division through base).

weak_unit types would be implicitly convertible to double.
This means that we could write

auto f(weak_unit::energy<MeV> en, weak_unit::length<cm> dist)
{
   // some calculations using en and dist as ordinary double values
   double res=...;
   return res*keV; // return strong_unit::energy
}

This would allow us to:

  • enforce unit compliance at call interfaces
  • avoid messing with the internals of any functions (if we don't want to).

There is user defined suffix operator in C++ (check this). And we can write values like: 1_m, 2_ns, 3_MeV

(Mostly tangential rambling following)

I have to say that I am bitterly disappointed that the suffix has to be explicitly spelled out in the definition. The very least they could have done is to allow template<SomeClass str, OtherClass suffix > ThirdClass operator "" "" (); and have the user take care of parsing literal and suffix. As it is, we would have to define all of the SI (prefix, unit) combinations by hand. This shyness does not become C++, which has traditionally answered the question "would you like to buy this feature at the cost of greatly increased complexity?" with "we will take three versions".
(The saving grace is that C++ 20 at least it allows (limited) template support for literal conversion operators, which means that SFINAE type variable conversion such as

template<Number x>
std::enable_if_t<(x%2)==0, EvenNumber> operator ""x_(){return x;} // and opposite for OddNumber
static_assert(std::is_same_v<decltype("100"_x), EvenNumber>);

works. This opens up the possibility of building a _pdg prefix. Instead of having to refer to a class proton by name we could just use the literal 2212_pdg. )

Just kidding. Using 42.0_m over 42.0*m has the advantage that the result does not depend on a variable called m being in scope or not, so it is probably a good idea. We can simply use the preprocessor to generate all the SI prefixes. Not.

@YanzhaoW
Copy link
Contributor

@klenze

Do you think following usage is good?

auto length = 1_m;
auto speed = 1_m/1_s;

auto area = length*length;
auto area2 = 1_m*1_m;

With suffix operator, I don't think we can do something like "1_m<2>".

@jose-luis-rs
Copy link
Contributor

This should be a proposal for FairRoot, we could then use it directly in R3BRoot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants