Add Annotating Docs (#8470)

* `quick_start.md`: Add quick start guide * `annotation.md`: Add section on annotating flatbuffers
google · Dec 27, 2024 · 2d86857 · 2d86857
1 parent 0f90dc8
commit 2d86857
Show file tree

Hide file tree

Showing 2 changed files with 154 additions and 0 deletions.
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -65,4 +65,6 @@ nav:
     - Overview: "schema.md"
     - Evolution: "evolution.md"
     - Grammar: "grammar.md"
+  - Advanced:
+    - Annotating Buffers (.afb): "annotation.md"
   - Contributing: "contributing.md"
diff --git a/docs/source/annotation.md b/docs/source/annotation.md
@@ -0,0 +1,152 @@
+# Annotating FlatBuffers
+
+This provides a way to annotate flatbuffer binary data, byte-by-byte, with a
+schema. It is useful for development purposes and understanding the details of
+the internal format.
+
+## Annotating
+
+Given a `schema`, as either a plain-text (`.fbs`) or a binary schema (`.bfbs`),
+and `binary` file(s) that were created by the `schema`. You can annotate them
+using:
+
+```sh
+flatc --annotate SCHEMA -- BINARY_FILES...
+```
+
+This will produce a set of annotated files (`.afb` Annotated FlatBuffer)
+corresponding to the input binary files.
+
+### Example
+
+Taken from the [tests/annotated_binary](https://github.com/google/flatbuffers/tree/master/tests/annotated_binary).
+
+```sh
+cd tests/annotated_binary
+../../flatc --annotate annotated_binary.fbs -- annotated_binary.bin
+```
+
+Which will produce a `annotated_binary.afb` file in the current directory.
+
+
+!!! Tip
+
+    The `annotated_binary.bin` is the flatbufer binary of the data contained
+    within `annotated_binary.json`, which was made by the following command:
+
+    ```sh
+    ..\..\flatc -b annotated_binary.fbs annotated_binary.json
+    ```
+
+## .afb Text Format
+
+Currently there is a built-in text-based format for outputting the annotations.
+A full example is shown here:
+[`annotated_binary.afb`](https://github.com/google/flatbuffers/blob/master/tests/annotated_binary/annotated_binary.afb)
+
+The data is organized as a table with fixed [columns](#columns) grouped into
+Binary [sections](#binary-sections) and [regions](#binary-regions), starting
+from the beginning of the binary (offset `0`).
+
+### Columns
+
+The columns are as follows:
+
+1. The offset from the start of the binary, expressed in hexadecimal format
+   (e.g. `+0x003c`).
+
+    The prefix `+` is added to make searching for the offset (compared to some
+    random value) a bit easier.
+
+2. The raw binary data, expressed in hexadecimal format. 
+
+    This is in the little endian format the buffer uses internally and what you
+    would see with a normal binary text viewer.
+
+3. The type of the data.
+
+    This may be the type specified in the schema or some internally defined
+    types:
+
+
+    | Internal Type | Purpose                                            |
+    |---------------|----------------------------------------------------|
+    | `VOffset16`   | Virtual table offset, relative to the table offset |
+    | `UOffset32`   | Unsigned offset, relative to the current offset    |
+    | `SOffset32`   | Signed offset, relative to the current offset      |
+
+
+4. The value of the data.
+
+    This is shown in big endian format that is generally written for humans to
+    consume (e.g. `0x0013`). As well as the "casted" value (e.g. `0x0013 `is
+    `19` in decimal) in parentheses.
+
+5. Notes about the particular data.
+
+    This describes what the data is about, either some internal usage, or tied
+    to the schema.
+
+### Binary Sections
+
+The file is broken up into Binary Sections, which are comprised of contiguous
+[binary regions](#binary-regions) that are logically grouped together. For
+example, a binary section may be a single instance of a flatbuffer `Table` or
+its `vtable`. The sections may be labelled with the name of the associated type,
+as defined in the input schema.
+
+An example of a `vtable` Binary Section that is associated with the user-defined
+`AnnotateBinary.Bar` table.
+
+```
+vtable (AnnotatedBinary.Bar):
+  +0x00A0 | 08 00      | uint16_t   | 0x0008 (8)   | size of this vtable
+  +0x00A2 | 13 00      | uint16_t   | 0x0013 (19)  | size of referring table
+  +0x00A4 | 08 00      | VOffset16  | 0x0008 (8)   | offset to field `a` (id: 0)
+  +0x00A6 | 04 00      | VOffset16  | 0x0004 (4)   | offset to field `b` (id: 1)
+```
+
+These are purely annotative, there is no embedded information about these
+regions in the flatbuffer itself.
+
+### Binary Regions
+
+Binary regions are contiguous bytes regions that are grouped together to form 
+some sort of value, e.g. a `scalar` or an array of scalars. A binary region may
+be split up over multiple text lines, if the size of the region is large.
+
+#### Annotation Example
+
+Looking at an example binary region:
+
+```
+vtable (AnnotatedBinary.Bar):
+  +0x00A0 | 08 00      | uint16_t   | 0x0008 (8)   | size of this vtable
+```
+
+The first column (`+0x00A0`) is the offset to this region from the beginning of
+the buffer. 
+
+The second column are the raw bytes (hexadecimal) that make up this region.
+These are expressed in the little-endian format that flatbuffers uses for the
+wire format.
+
+The third column is the type to interpret the bytes as. For the above example,
+the type is `uint16_t` which is a 16-bit unsigned integer type.
+
+The fourth column shows the raw bytes as a compacted, big-endian value. The raw
+bytes are duplicated in this fashion since it is more intuitive to read the data
+in the big-endian format (e.g., `0x0008`). This value is followed by the decimal
+representation of the value (e.g., `(8)`). For strings, the raw string value is
+shown instead. 
+
+The fifth column is a textual comment on what the value is. As much metadata as
+known is provided.
+
+### Offsets
+
+If the type in the 3rd column is of an absolute offset (`SOffet32` or
+`Offset32`), the fourth column also shows an `Loc: +0x025A` value which shows
+where in the binary this region is pointing to. These values are absolute from
+the beginning of the file, their calculation from the raw value in the 4th
+column depends on the context.