-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Design sketch for bit shift operators.
- Loading branch information
Showing
3 changed files
with
527 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,295 @@ | ||
# Design Sketch: Left and Right Bit Shift Operators | ||
|
||
## Overview | ||
|
||
This is a proposal to add left and right shift to the Emboss expression | ||
language. | ||
|
||
Bit shifts are common in embedded systems work. Sometimes, the lack of bit | ||
shifts in Emboss can be worked around, such as: | ||
|
||
``` | ||
bits Control: | ||
0 [+8] UInt control_byte | ||
0 [+4] UInt frame_type_1 | ||
4 [+1] UInt p_f | ||
5 [+3] UInt frame_type_2 | ||
# frame_type_1 | (frame_type_2 << 5) | ||
let frame_type = frame_type_1 + frame_type_2 * 32 | ||
``` | ||
|
||
However, this is awkward, and only works for left shift by a constant value. | ||
|
||
It would also be useful to add bitwise and, or, xor, and not operators, but | ||
those are not in scope for this proposal. | ||
|
||
|
||
## Syntax | ||
|
||
### Symbol | ||
|
||
The symbols `<<` and `>>` are used for bit shift operators in many common | ||
programming languages. For `>>`, the choice of whether to sign-extend or | ||
zero-extend ("arithmetic" or "unsigned") usually depends on the type of the | ||
left operand. Java, C#, and JavaScript have an additional "unsigned right | ||
shift" operator `>>>`, which I believe is due to the (historical) lack of an | ||
unsigned integer type in those languages. | ||
|
||
Other than `<<` and `>>`, other languages use functions with language-specific | ||
names. | ||
|
||
| Language | Left Shift | Right Shift (RS) | | ||
| ---------- | ----------------- | -------------------------------------- | | ||
| ASM (x86) | `SAL`/`SHL` | `SAR` (arithmetic), `SHR` (unsigned) | | ||
| ASM (ARM) | `LSLV` | `ASRV` (arithmetic), `LSRV` (unsigned) | | ||
| C | `<<` | `>>` | | ||
| C++ | `<<` | `>>` | | ||
| C# | `<<` | `>>` (arithmetic), `>>>` (unsigned) | | ||
| Go | `<<` | `>>` | | ||
| Fortran | `SHIFTL()` | `SHIFTA()` (arithmetic), `SHIFTR()` (unsigned) | | ||
| Haskell | `shift`, `shiftL` | `shift` by negative, `shiftR` | | ||
| Java | `<<` | `>>` (arithmetic), `>>>` (unsigned) | | ||
| JavaScript | `<<` | `>>` (arithmetic), `>>>` (unsigned) | | ||
| Lua | `<<` | `>>` (unsigned) | | ||
| MatLab | `bitshift(A, k)` | `bitshift(A, -k)` | | ||
| OCaml | `shift_left` | `shift_right` | | ||
| Perl | `<<` | `>>` | | ||
| PHP | `<<` | `>>` (arithmetic) | | ||
| Python | `<<` | `>>` | | ||
| Rust | `<<` | `>>` | | ||
| SQL | Nonstandard | Nonstandard | | ||
|
||
|
||
#### Proposal | ||
|
||
Emboss should use `<<` and `>>` operators for left and right shift. For now, | ||
`>>` should always be arithmetic right shift: since integers in the Emboss | ||
expression language are notionally infinite-precision, there is not a 'natural' | ||
number of bits at which to cut off negative numbers. | ||
|
||
Once/if bitwise operators are implemented, unsigned right shift can be handled | ||
by forcing the left operand into an unsigned representation using `&`. | ||
|
||
|
||
### Precedence | ||
|
||
The precedence of `>>` and `<<` is consistent across many languages, sitting | ||
between additive operators (binary `+` and `-`) and comparison operators (`<`, | ||
`>`, and so on). | ||
|
||
There is inconsistency in the precedence of `&`, `^`, and `|`, with many | ||
languages taking C's choice (binary bitwise operators bind less tightly than | ||
comparisons) and a notable minority fixing C's mistake (by putting bitwise | ||
operators between shifts and comparisons). | ||
|
||
|
||
#### C, C++, Perl, JavaScript, Java, & C# | ||
|
||
1. `+` `-` | ||
2. `<<` `>>` `>>>` | ||
3. `<` `<=` `>` `>=` | ||
4. `==` `!= | ||
5. `&` | ||
6. `^` | ||
7. `|` | ||
|
||
|
||
#### Rust, Python, & Lua | ||
|
||
1. `+` `-` | ||
2. `<<` `>> | ||
3. `&` | ||
4. `^`/`~` | ||
5. `|` | ||
6. `==` `!=`/`~=` `<` `>` `<=` `>=` | ||
|
||
|
||
#### Go | ||
|
||
1. `*` `/` `%` `<<` `>>` `&` `&^` | ||
2. `+` `-` `|` `^` | ||
3. `==` `!=` `<` `<=` `>` `>=` | ||
4. `&&` | ||
5. `||` | ||
|
||
|
||
#### Straw Poll | ||
|
||
An informal poll of other developers found that some developers have a mental | ||
model roughly equivalent to: | ||
|
||
1. `+` `-` `<<` `>>` | ||
2. `<` `<=` `>` `>=` | ||
|
||
In particular, the expression `5 + 11 >> 1 + 1` was thought to evaluate to 11 | ||
by some, and 4 (as in all common programming languages other than Go) by | ||
others. | ||
|
||
|
||
#### Proposal | ||
|
||
Emboss's operator precedence is not a strict ordering: certain operators (such | ||
as `&&` and `||`) are neither higher, lower, or equal in precedence to each | ||
other, and as such cannot be mixed without parentheses. | ||
|
||
Given the developer confusion around `>>` vs `+`, Emboss's precedence should be | ||
(changes in **bold**): | ||
|
||
1. `()` `$max()` `$present()` `$upper_bound()` `$lower_bound()` | ||
2. unary `+` and `-` (cannot be combined without parentheses) | ||
3. `*` | ||
4. {`+` `-`} **/ {`<<` `>>`} (`+` and `-` cannot be combined with `>>` or `<<` | ||
without parentheses)** | ||
5. {`!=`} / {`<` `<=` `==`} / {`>` `>=` `==`} (comparisons in the same | ||
direction can be *chained*; comparisons in opposite directions cannot be | ||
combined without parentheses) | ||
6. {`&&`} / {`||`} (`&&` and `||` cannot be combined without parentheses) | ||
7. `?:` | ||
|
||
In diagram form: | ||
|
||
![The same information as above, in the form of a graph](./bit_shift/precedence.svg) | ||
|
||
|
||
## Semantics | ||
|
||
### Shifts by 0, Negative, or Large Values | ||
|
||
It appears that all (common?) languages properly handle shifts by 0, treating | ||
them as a no-op. | ||
|
||
Shifts by values equal to or larger than the width of the left operand (e.g., | ||
64 bits on a 64 bit system) are split between treating them as an error, shift | ||
by the low-order bits of the right operand (e.g., mask to low 6 bits for 64-bit | ||
operands), or return 0 or -1. Common hardware appears to mask to the low bits. | ||
|
||
Shifts by negative values are similarly split between treating them as an | ||
error, shift by the low-order bits of the right operand (so a shift by -1 is | ||
equivalent to a shift by 63), or shift in the opposite direction. | ||
|
||
One particularly notable issue is that C++ (until C++20) and C both treat left | ||
shift as undefined when the left operand is negative. This is contrary to | ||
every other language checked, including x86 and ARM assembly (both 32- and | ||
64-bit). | ||
|
||
In the table below, "UB" indicates undefined behavior (e.g., a program that | ||
performs that operation is not valid), and "IDB" indicates | ||
implementation-defined behavior (a program that performs the operation is | ||
valid, but the result varies depending on the implementation). | ||
|
||
| Language | `x>>0` | `x<<0` | `-2<<x` | `x<<64` | `x>>64` | `x>>-1` | | ||
| ---------- | ------ | ------ | ------- | -------- | -------- | --------- | | ||
| ASM (x86) | `x` | `x` | `-2<<x` | `x<<0` | `x>>0` | `x>>63` | | ||
| ASM (ARM) | `x` | `x` | `-2<<x` | `x<<0` | `x>>0` | `x>>63` | | ||
| C++11 | `x` | `x` | UB | UB | UB | UB | | ||
| C++20 | `x` | `x` | `-2<<x` | UB | UB | UB | | ||
| C | `x` | `x` | UB | UB | UB | UB | | ||
| C# | `x` | `x` | `-2<<x` | `x<<0` | `x>>0` | `x>>63` | | ||
| Go | `x` | `x` | `-2<<x` | `0` | `0` | ? | | ||
| Fortran | `x` | `x` | `-2<<x` | UB | UB | UB | | ||
| Haskell | `x` | `x` | `-2<<x` | `0` | `0` | Panic | | ||
| Java | `x` | `x` | `-2<<x` | `x<<0` | `x>>0` | `x>>63` | | ||
| JavaScript | `x` | `x` | `-2<<x` | `x<<0` | `x>>0` | `x>>63` | | ||
| Lua | `x` | `x` | `-2<<x` | `0` | `0` | `x<<1` | | ||
| OCaml | `x` | `x` | `-2<<x` | IDB | IDB | IDB | | ||
| Perl | `x` | `x` | `-2<<x` | Varies | `x>>64` | `x<<1` | | ||
| PHP | `x` | `x` | `-2<<x` | `0` | `0` | Exception | | ||
| Python | `x` | `x` | `-2<<x` | `x>>64` | `x>>64` | Exception | | ||
| Rust | `x` | `x` | `-2<<x` | UB | UB | Overflow | | ||
|
||
|
||
#### Proposal | ||
|
||
Emboss can use its bounds system to ensure that no shifts by too-large or | ||
negative amounts can happen at runtime. This ensures that no surprising or | ||
undefined behavior can occur in generated code. | ||
|
||
This may prove to be overly restrictive, but it can be loosened in the future, | ||
if necessary. | ||
|
||
|
||
### Expression Bounds and Modular Value Calculations | ||
|
||
#### Left Shift | ||
|
||
For a shift `x << y`, there are several cases to consider: | ||
|
||
1. `x` and `y` are both constants: in this case, the result is also a | ||
(pre-computed) constant. | ||
2. Only `y` is a constant: when `y` is a constant, `x << y` is equivalent to | ||
`x` × 2<sup>y</sup>. The minimum value, maximum value, modulus, and | ||
modular value of the result are all equal to 2<sup>y</sup> multiplied by | ||
the respective bound on `x`. | ||
3. Otherwise: | ||
* `minimum_value` is either `minimum value of x << maximum value of y` or | ||
`minimum value of x << minimum value of y`, depending on signs | ||
* `maximum_value` is either `maximum value of x << minimum value of y` or | ||
`maximum value of x << maximum value of y`, depending on signs | ||
* `modular_value` is 0 | ||
* `modulus` is: | ||
* `infinity` if the `modulus` of `x` is `infinity` and the | ||
`modular_value` of `x` is 0 | ||
* 2<sup>minimum value of `y`</sup> × (largest power-of-2 factor of | ||
the `modular_value` of `x`) if the `modulus` of `x` is | ||
`infinity` | ||
* 2<sup>minimum value of `y`</sup> × (largest power-of-2 factor of | ||
the modulus of `x`) if the `modular_value` of `x` is 0 | ||
* 2<sup>minimum value of `y`</sup> × min(largest power-of-2 factor | ||
of the `modular_value` of `x`, largest power-of-2 factor of the | ||
`modulus` of `x`) if the `modular_value` of `x` is not 0 | ||
|
||
These rules have been prototyped and tested against all valid bounds and | ||
inhabiting values of those bounds with bound values from -32 through +32, | ||
inclusive. | ||
|
||
|
||
#### Right Shift | ||
|
||
For a shift `x >> y`, there are several cases to consider: | ||
|
||
1. `x` and `y` are both constants: in this case, the result is also a | ||
(pre-computed) constant. | ||
2. `y` is a constant and the `modulus` and `modular_value` of `x` are both | ||
evenly divisible by 2<sup>y</sup>: in this case, the bounds of the result | ||
are equal to the corresponding bounds on `x` divided by 2<sup>y</sup>. | ||
Note that this is a special case of computing the bounds for | ||
[division](./division_and_modulus.md#division). | ||
3. Otherwise: | ||
* `minimum_value` is either `minimum value of x >> maximum value of y` or | ||
`minimum value of x >> minimum value of y`, depending on signs | ||
* `maximum_value` is either `maximum value of x >> minimum value of y` or | ||
`maximum value of x >> maximum value of y`, depending on signs | ||
* `modular_value` is 0 | ||
* `modulus` is 1 | ||
|
||
These rules have been prototyped and tested against all valid bounds and | ||
inhabiting values of those bounds with bound values from -32 through +32, | ||
inclusive. | ||
|
||
There are definitely tighter bounds that can be inferred in some cases (e.g., a | ||
shift of a constant `x` by a `y` that can only take 2 values will have a result | ||
that can only take 2 values, but will not, in general, have an inferred bound | ||
that only covers those two values), but these rules seem to cover most cases | ||
without being particularly computationally expensive. | ||
|
||
|
||
## Implementation Notes | ||
|
||
### Parser | ||
|
||
The parser can be modified by putting in parallel rules for `shift-` | ||
nonterminals wherever a rule references any `additive-` nonterminal. This will | ||
automatically create a set of rules that put the `<<` and `>>` operators into a | ||
new precedence that is at the same level as, but cannot be mixed with, the | ||
additive operators. | ||
|
||
|
||
### Runtime | ||
|
||
The C++ implementation of the left shift operator should avoid undefined | ||
behavior by using `static_cast<std::make_unsigned<T>::type>()` on its operands | ||
before shifting, and then convert back to `T` for the result. Note that the | ||
conversion back to `T` cannot (always) be done with a simple cast until C++17. | ||
There is some code in `IntView` that properly implements the conversion for ISO | ||
C++11, which should be factored out so that it can be shared. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
strict digraph { | ||
bgcolor = "white" | ||
ordering = "out" | ||
edge [color = "black"] | ||
node [ | ||
color = "black" | ||
fontcolor = "black" | ||
shape = box | ||
ordering = "out" | ||
] | ||
subgraph { | ||
// rank = same | ||
rankdir = LR | ||
|
||
paren [ordering = "out"; label="() $max() $present()"] | ||
unary_plus [ordering = "out"; label="unary + -"] | ||
mult [ordering = "out"; label="*"] | ||
add [ordering = "out"; label="+ -"] | ||
shift [ordering = "out"; label=">> <<"] | ||
ne [ordering = "out"; label="!="] | ||
le [ordering = "out"; label="< <= =="] | ||
ge [ordering = "out"; label="> >= =="] | ||
land [ordering = "out"; label="&&"] | ||
lor [ordering = "out"; label="||"] | ||
choice [ordering = "out"; label="?:"] | ||
paren -> unary_plus | ||
unary_plus -> mult | ||
mult -> add | ||
mult -> shift | ||
add -> ne | ||
add -> le | ||
add -> ge | ||
shift -> ne | ||
shift -> le | ||
shift -> ge | ||
ne -> land | ||
ne -> lor | ||
le -> land | ||
le -> lor | ||
ge -> land | ||
ge -> lor | ||
land -> choice | ||
lor -> choice | ||
} | ||
} |
Oops, something went wrong.