Albatross Bytecode or simply AB is a bytecode with internal structure, operations and binary format precisely defined to be used by a process virtual machine as well as by a whole-system virtual machine without the necessity of writing two separate source s.
Because of this additional complexity, bytecode must have been defined in such way that it would contain only behavior of such level of abstractness about which it could be said is common for both process and whole-system virtual machines.
This document is separated into three sections, each of them describing one of the three main bytecode aspects. These aspects are:
- its overall structure (section "Overall structure")
- behavior of a program written in this bytecode (section "Bytecode execution")
- binary format of a generated bytecode file (section "Binary format")
It will be held that any of the sections won't refer to any section written later in this document to make it possible to use this document also as a learning resource.
Code generated into this bytecode is always put into one or more files with a single fixed binary format discussed later in this document. Each of these files represents a single entity or a namespace in non-conventional meaning.
Every namespace is generally accessible space of sub-namespaces (each of them being a normal namespace) or functions and fields.
It could be simply said that both of these namespace types are some kind of a "name group", i. e. something that groups names of contained elements and makes them unique from the perspective of all code being executed. Namespaces are defined by their properties:
- Two namespaces in a single parent namespace can't have the same name. In other words, two elements of a namespace can't have the same name. Every name of an element in a namespace (whether it's another namespace, function or field) must be unique in respect to the namespace.
- Namespaces can contain only elements which are namespaces, or only elements which are functions or a fields. In other words, namespaces that contain at least one sub-namespace must be purely-namespace. In contrast, namespaces that contain at least one field or function mustn't contain any sub-namespace - they are purely-executable namespaces.
Purely-namespace namespaces can be also referred to as simply "namespaces", while purely-executable namespaces can be also referred to as "classes".
Both kinds of namespaces carry a common property - name. For purely-namespace namespaces it is also the only property they have. A namespace name is a string at least one character long that starts with an letter of English alphabet or an underscore ('_'), and may be followed by another letters, underscores or decimal digits.
Purely-namespace namespaces group other namespaces and create hierarchy in classes. From the definition of classes we can say, that if namespace hierarchy was an instance of a composite pattern, class would be the "leaf" or the "terminal" type.
To specify the place where a class is/belongs in hierarchy, we must designate its path. Path is a single namespace name or a combination of namespace names ordered as they follow in hierarchy from the top to bottom, separated by a single dot '.' between each pair.
Few examples of paths:
ARootNamespace.
ARootNamespace.ASubNamespace
ARootNamespace.First_SubNamespace.Second_SubNamespace.AClass
AClass
Because every class is also a namespace, a path may consist only from a class name.
As it was written before, class is a namespace which doesn't contain any other sub-namespaces, but rather functions and/or fields. These functions and fields are directly used for execution, thus the name purely-executable namespaces. All classes are used to form objects - instances of classes. When an object is formed from some class, we say that the class is its type.
Classes have a property specific to them - their validity in global context. A class can be either static or dynamic. Classes are always used to form objects according to their validity:
- Static classes form static objects. Those are:
- created prior to start of execution
- accessible to everything loaded to current execution (every other class and its members)
- destroyed at the end of current execution
- Dynamic classes form dynamic objects. Those are:
- created during execution as a result of an instruction execution
- accessible through their handles (pointers)
- destroyed when requested
Functions and fields have a common name - members. Members these have their properties as concerns all classes:
- accessibility (or visibility) specifying which classes can and which can not access the member
- heredity specifying whether specific members can be defined, redefined or modified
Fields are named chunks of data with a size known before the execution. Because they are named, they can be simply accessed by their name without the need of knowing their location and using a handle, although this option still exists.
Fields do not have a type, but size. When modified, not all field data have to be updated. A size of an object is determined by a sum of sizes of all its fields, and therefore it can be said about every field name that it represents the offset from the objects location in memory.
Function is a sequence of instructions taking arguments as input and production a return value as output. Again, the return type of a piece of data that is being returned is not important, only its size is.
Dynamic functions have an additional argument over static functions - a pointer to their object (object context) used as a logical base for their operation.
Inheritance is a bond between parent classes and a child class applied only on dynamic classes. Dynamic classes can inherit members of their parent classes based on the accessibility of these members:
- public members get inherited and remain publicly accessible for all classes in current execution
- protected members get inherited and remain accessible for the child class and the classes the child class is a parent of
- private members don't get inherited, they are available only for members of the class that defines them
Inherited members of a child class can be accessed both though the child and parent class. That means, that every operation executed on an instance of a parent class can be equally done on any child class. A call to an inherited function requires the object context argument to have the type of the inheriting class. Doing it otherwise wouldn't have sense.
Resource is a block of closely unspecified data available for use from within the execution. Every class can have its own resources which are accessible only for the class defining them.
Albatross Bytecode is a stack-based bytecode with support for dynamic properties of object oriented programming, whilst still being useful for representing intermediate code for languages which aren't object oriented. Because of this fact, all of the instructions are separated to groups according to their purpose, so only the instructions meaningful for current application can get used.
- Stack instructions - instructions affecting stack state for purposes of the stack only
- Constant loading instructions
- Arithmetic instructions - instructions performing normal arithmetic operations
- Bitwise instructions - instruction performing bitwise operations
- Comparison instructions - instructions performing basic comparisons of values
- Branching instructions - instructions performing conditional and unconditional branching (jumps)
- Memory manipulation instructions - instructions allocating, freeing or modifying memory for purposes of the memory only
- Object manipulation instructions - instructions performing operations somehow related to objects
- Exception instructions
- Meta instructions - instructions for operations accessing or working with the components of bytecode binary
pop
- removes the top element on the stackdown
- removes the second element on the stackdup
- duplicates the top element on the stackswap
- swaps the first and second element on the stackaddressof_top
- pushes the address of stack´s top onto the stackaddressof_frame
- pushes the address of the current frame onto the stackget_local [LOCAL]
- pushes an element onto stack from the storage allocated for local variable[LOCAL]
set_local [LOCAL]
- pops an element from stack´s top and moves it into storage allocated for local variable[LOCAL]
load_raw [VALUE]
- loads a raw word[VALUE]
and pushes it on the stackload_zero
- loads a word of value zero and pushes it on the stackload_one
- loads a word of value one and pushes it on the stack
add_integer
- pops two top values from the stack, adds them as integers and pushes the result onto the stackadd_float
- pops two top values from the stack, adds them as floating-point numbers and pushes the result onto the stacksub_integer
- pops two top values from the stack, subtracts the topper one from the one below as integers and pushes the result onto the stacksub_float
- pops two top values from the stack, subtracts the topper one from the one below as floating-point numbers and pushes the result onto the stackmul_integer
- pops two top values from the stack, multiplies them as integers and pushes the result onto the stackmul_float
- pops two top values from the stack, multiplies them as floating-point numbers and pushes the result onto the stackdiv_integer
- pops two top values from the stack, divides the lower one by the higher one as integers and pushes the result of integer division onto the stackdiv_float
- pops two top values from the stack, divides the lower one by the higher one as floating-point numbers and pushes the result of floating-point division onto the stackmod_integer
- pops two top values from the stack, divides the lower one by the higher one as integers and pushes the remainder of integer division onto the stackmod_float
- pops two top values from the stack, divides the lower one by the higher one as floating-point numbers and pushes the remained of floating-point division onto the stackneg
- negates the top level value
and
- pops two top values from the stack, performs a bitwise and on the lower element while the higher one is being a mask, and places the result on the stackor
- pops two top values from the stack, performs a bitwise or on the elements and places the result on the stackxor
- pops two top values from the stack, performs a bitwise xor on the elements and places the result on the stacknot
- pops the top value from the stack, performs a bitwise not (negates all bits of the value) and places the result on the stackshl [BITS]
- pops the top value from the stack, performs a bitwise shift to left by[BITS]
bits and places the result on the stackshr [BITS]
- pops the top value from the stack, performs a bitwise shift to right by[BITS]
bits and places the result on the stack
Comparison instructions can result only into two values: false (represented as raw zero - 0) and true (represented as raw one - 1).
cmp_equal
- pops two top values from the stack and places the truth value of expression "LOWER equals HIGHER" onto the stackcmp_bigger
- pops two top values from the stack and places the truth value of expression "LOWER is bigger than HIGHER" onto the stackcmp_biggerorequals
- pops two top values from the stack and places the truth value of expression "LOWER is bigger than or equals HIGHER" onto the stack
Branching instruction change the next instruction to be executed, and that conditionally (based on the value on the stack) or unconditionally (ever).
br [OFFSET]
- transfers the execution to the offset[OFFSET]
relative to position of this instructionbr_true [OFFSET]
- transfers the execution to the offset[OFFSET]
relative to position of this instruction only if the value on the top of the stack equals onebr_false [OFFSET]
- transfers the execution to the offset[OFFSET]
relative to position of this instruction only if the value on the top of the stack equals zerobr_indirect
- transfers the execution to the instruction located at location pointed on by the stack´s topreturn
- transfers the execution back from a callee to the caller, destroying the current stack frame
alloc [AMOUNT]
- allocates the amount [AMOUNT] of bytes in memory and places a pointer to this memory onto the stackrealloc [AMOUNT]
- reallocates the memory pointed on by the pointer popped from the top of the stack to amount [AMOUNT] (in bytes), placing a pointer on the reallocated memory onto the stackfree
- frees the memory pointed on by the pointer on the top of the stackpoint
- gets a word data from the location pointed on by a pointer popped from the top of the stackcopy_to_stack [N]
- pushes[N]
(count) words from the memory pointed on by a pointer popped from the stack onto the stackcopy_to_memory [N]
- moves[N]
(count) words from the memory pointed on by a pointer popped from the stack by popping them from the stack to memorycopy_memory
- moves(stack_top)
bytes from one location ((stack_top-1)
) to another ((stack_top-2)
).
new_object [TYPE]
- allocates a new object of type[TYPE]
, placing a pointer on the allocated object onto the stacknew_array [TYPE] [SIZE]
- allocates a new array for[SIZE]
objects of type[TYPE]
, placing a pointer on the allocated array onto the stackfree_object
- frees the object pointed on by the pointer on the top of the stackfree_array
- frees the array pointed on by the pointer on the top of the stack
get_field [TYPE] [FIELD]
- places the value of field[FIELD]
of type[TYPE]
of the object pointed on by the element on the top of the stackset_field [TYPE] [FIELD]
- places the value on the top of the stack to field[FIELD]
of type [TYPE] of the object pointed on by the element below the stack´s topcall_static [TYPE] [FUNCTION]
- calls the function[FUNCTION]
of type[TYPE]
with arguments on the top of the stack, creating a new stack framecall_dynamic [TYPE] [FUNCTION]
- calls the function[FUNCTION]
of type[TYPE]
of the object pointed on by the element on the stack´s top with arguments being on the top of the stack, creating a new stack framecall_indirect
- calls the instruction pointed on by the top of the stack
get_element
- places the value ofn
-th element in the array of objects pointed on by the element below the stack´s top, using the value of the element on the stack´s top asn
addressof_field [TYPE] [FIELD]
- places the address of field[FIELD]
of type[TYPE]
of the object pointed on by the element on the stack´s top onto the top of the stackaddressof_function [TYPE] [FUNCTION]
- places the address of function[FUNCTION]
of type[TYPE]
of the object pointed on by the element on the stack´s top onto the top of the stackaddressof_element
- places the address ofn
-th element in the array of objects pointed on by the element below the stack´s top, using the value of the element on the stack´s top asn
is_type [TYPE]
- pops a pointer to an object from the top of the stack and compares its type with[TYPE]
. The result of this comparison is placed on the stack.check_type [TYPE]
- pops a pointer to an object from the top of the stack and compares its type with[TYPE]
. Iftype(obj) != [TYPE]
,CastException
is thrown.
throw
- pops a pointer to an exception object from the stack and throws it as exception
alloc_resource [RESOURCE]
- loads the resource[RESOURCE]
to the memory and places the memory address onto the stackfree_resource
- frees the resource pointed on by the pointer on the stack´s top and removes the pointer from the stack
Albatross Bytecode is stored in rather simple binary format organized in the similar fashion as it is in its overall structure.
The binary format uses only two terms for specifying size: BYTE
or byte, and WORD
or word. Byte is always a chunk of 8 bits, while a word must be a chunk of at least 32 bits, but it can be more. The size of a word is specified by the constant pool.
Purely-namespace namespaces (or now just "namespaces" for the simplicity) are stored in the file system as directories or folders. Each of the namespace folders contains file system entries representing other namespaces or classes.
Class data is stored in a file without and specific extension. The format of a class binary consists of the following parts:
- class magic - a word-sized magic number
- constant pool - containing constant class data accessible by indices
- class header - containing basic information about the class
- namespace imports - containing a list of namespaces that must be loaded with the class
- field list - containing a list of elements describing the fields
- function list - containing a list of elements declaring and defining functions
Class magic is a 32-bit value 0xBADEDADE
or any of its repetitions on case of more than 32-bit word.
Begins with the number of entries in index list and size of all data located in the constant pool.
- ``