Albatross Bytecode Specification

General description

Albatross Bytecode or simply AB is a bytecode with internal structure, operations and binary format precisely defined to be used by a process virtual machine as well as by a whole-system virtual machine without the necessity of writing two separate source s.

Because of this additional complexity, bytecode must have been defined in such way that it would contain only behavior of such level of abstractness about which it could be said is common for both process and whole-system virtual machines.

Structure of this document

This document is separated into three sections, each of them describing one of the three main bytecode aspects. These aspects are:

its overall structure (section "Overall structure")
behavior of a program written in this bytecode (section "Bytecode execution")
binary format of a generated bytecode file (section "Binary format")

It will be held that any of the sections won't refer to any section written later in this document to make it possible to use this document also as a learning resource.

Overall structure

Code generated into this bytecode is always put into one or more files with a single fixed binary format discussed later in this document. Each of these files represents a single entity or a namespace in non-conventional meaning.

Every namespace is generally accessible space of sub-namespaces (each of them being a normal namespace) or functions and fields.

It could be simply said that both of these namespace types are some kind of a "name group", i. e. something that groups names of contained elements and makes them unique from the perspective of all code being executed. Namespaces are defined by their properties:

Two namespaces in a single parent namespace can't have the same name. In other words, two elements of a namespace can't have the same name. Every name of an element in a namespace (whether it's another namespace, function or field) must be unique in respect to the namespace.
Namespaces can contain only elements which are namespaces, or only elements which are functions or a fields. In other words, namespaces that contain at least one sub-namespace must be purely-namespace. In contrast, namespaces that contain at least one field or function mustn't contain any sub-namespace - they are purely-executable namespaces.

Purely-namespace namespaces can be also referred to as simply "namespaces", while purely-executable namespaces can be also referred to as "classes".

Namespace

Both kinds of namespaces carry a common property - name. For purely-namespace namespaces it is also the only property they have. A namespace name is a string at least one character long that starts with an letter of English alphabet or an underscore ('_'), and may be followed by another letters, underscores or decimal digits.

Purely-namespace namespaces group other namespaces and create hierarchy in classes. From the definition of classes we can say, that if namespace hierarchy was an instance of a composite pattern, class would be the "leaf" or the "terminal" type.

To specify the place where a class is/belongs in hierarchy, we must designate its path. Path is a single namespace name or a combination of namespace names ordered as they follow in hierarchy from the top to bottom, separated by a single dot '.' between each pair.

Few examples of paths:

ARootNamespace.
ARootNamespace.ASubNamespace
ARootNamespace.First_SubNamespace.Second_SubNamespace.AClass
AClass

Because every class is also a namespace, a path may consist only from a class name.

Class

As it was written before, class is a namespace which doesn't contain any other sub-namespaces, but rather functions and/or fields. These functions and fields are directly used for execution, thus the name purely-executable namespaces. All classes are used to form objects - instances of classes. When an object is formed from some class, we say that the class is its type.

Classes have a property specific to them - their validity in global context. A class can be either static or dynamic. Classes are always used to form objects according to their validity:

Static classes form static objects. Those are:
- created prior to start of execution
- accessible to everything loaded to current execution (every other class and its members)
- destroyed at the end of current execution
Dynamic classes form dynamic objects. Those are:
- created during execution as a result of an instruction execution
- accessible through their handles (pointers)
- destroyed when requested

Functions and fields have a common name - members. Members these have their properties as concerns all classes:

accessibility (or visibility) specifying which classes can and which can not access the member
heredity specifying whether specific members can be defined, redefined or modified

Fields

Fields are named chunks of data with a size known before the execution. Because they are named, they can be simply accessed by their name without the need of knowing their location and using a handle, although this option still exists.

Fields do not have a type, but size. When modified, not all field data have to be updated. A size of an object is determined by a sum of sizes of all its fields, and therefore it can be said about every field name that it represents the offset from the objects location in memory.

Functions

Function is a sequence of instructions taking arguments as input and production a return value as output. Again, the return type of a piece of data that is being returned is not important, only its size is.

Dynamic functions have an additional argument over static functions - a pointer to their object (object context) used as a logical base for their operation.

Inheritance

Inheritance is a bond between parent classes and a child class applied only on dynamic classes. Dynamic classes can inherit members of their parent classes based on the accessibility of these members:

public members get inherited and remain publicly accessible for all classes in current execution
protected members get inherited and remain accessible for the child class and the classes the child class is a parent of
private members don't get inherited, they are available only for members of the class that defines them

Inherited members of a child class can be accessed both though the child and parent class. That means, that every operation executed on an instance of a parent class can be equally done on any child class. A call to an inherited function requires the object context argument to have the type of the inheriting class. Doing it otherwise wouldn't have sense.

Resource

Resource is a block of closely unspecified data available for use from within the execution. Every class can have its own resources which are accessible only for the class defining them.

Bytecode execution

Albatross Bytecode is a stack-based bytecode with support for dynamic properties of object oriented programming, whilst still being useful for representing intermediate code for languages which aren't object oriented. Because of this fact, all of the instructions are separated to groups according to their purpose, so only the instructions meaningful for current application can get used.

Stack instructions - instructions affecting stack state for purposes of the stack only
Constant loading instructions
Arithmetic instructions - instructions performing normal arithmetic operations
Bitwise instructions - instruction performing bitwise operations
Comparison instructions - instructions performing basic comparisons of values
Branching instructions - instructions performing conditional and unconditional branching (jumps)
Memory manipulation instructions - instructions allocating, freeing or modifying memory for purposes of the memory only
Object manipulation instructions - instructions performing operations somehow related to objects
Exception instructions
Meta instructions - instructions for operations accessing or working with the components of bytecode binary

Stack instructions

pop - removes the top element on the stack
down - removes the second element on the stack
dup - duplicates the top element on the stack
swap - swaps the first and second element on the stack
addressof_top - pushes the address of stack´s top onto the stack
addressof_frame - pushes the address of the current frame onto the stack
get_local [LOCAL] - pushes an element onto stack from the storage allocated for local variable [LOCAL]
set_local [LOCAL] - pops an element from stack´s top and moves it into storage allocated for local variable [LOCAL]

Constant loading instructions

load_raw [VALUE] - loads a raw word [VALUE] and pushes it on the stack
load_zero - loads a word of value zero and pushes it on the stack
load_one - loads a word of value one and pushes it on the stack

Arithmetic instructions

add_integer - pops two top values from the stack, adds them as integers and pushes the result onto the stack
add_float - pops two top values from the stack, adds them as floating-point numbers and pushes the result onto the stack
sub_integer - pops two top values from the stack, subtracts the topper one from the one below as integers and pushes the result onto the stack
sub_float - pops two top values from the stack, subtracts the topper one from the one below as floating-point numbers and pushes the result onto the stack
mul_integer - pops two top values from the stack, multiplies them as integers and pushes the result onto the stack
mul_float - pops two top values from the stack, multiplies them as floating-point numbers and pushes the result onto the stack
div_integer - pops two top values from the stack, divides the lower one by the higher one as integers and pushes the result of integer division onto the stack
div_float - pops two top values from the stack, divides the lower one by the higher one as floating-point numbers and pushes the result of floating-point division onto the stack
mod_integer - pops two top values from the stack, divides the lower one by the higher one as integers and pushes the remainder of integer division onto the stack
mod_float - pops two top values from the stack, divides the lower one by the higher one as floating-point numbers and pushes the remained of floating-point division onto the stack
neg - negates the top level value

Bitwise instructions

and - pops two top values from the stack, performs a bitwise and on the lower element while the higher one is being a mask, and places the result on the stack
or - pops two top values from the stack, performs a bitwise or on the elements and places the result on the stack
xor - pops two top values from the stack, performs a bitwise xor on the elements and places the result on the stack
not - pops the top value from the stack, performs a bitwise not (negates all bits of the value) and places the result on the stack
shl [BITS] - pops the top value from the stack, performs a bitwise shift to left by [BITS] bits and places the result on the stack
shr [BITS] - pops the top value from the stack, performs a bitwise shift to right by [BITS] bits and places the result on the stack

Comparison instructions

Comparison instructions can result only into two values: false (represented as raw zero - 0) and true (represented as raw one - 1).

cmp_equal - pops two top values from the stack and places the truth value of expression "LOWER equals HIGHER" onto the stack
cmp_bigger - pops two top values from the stack and places the truth value of expression "LOWER is bigger than HIGHER" onto the stack
cmp_biggerorequals - pops two top values from the stack and places the truth value of expression "LOWER is bigger than or equals HIGHER" onto the stack

Branching instructions

Branching instruction change the next instruction to be executed, and that conditionally (based on the value on the stack) or unconditionally (ever).

br [OFFSET] - transfers the execution to the offset [OFFSET] relative to position of this instruction
br_true [OFFSET] - transfers the execution to the offset [OFFSET] relative to position of this instruction only if the value on the top of the stack equals one
br_false [OFFSET] - transfers the execution to the offset [OFFSET] relative to position of this instruction only if the value on the top of the stack equals zero
br_indirect - transfers the execution to the instruction located at location pointed on by the stack´s top
return - transfers the execution back from a callee to the caller, destroying the current stack frame

Memory manipulation instructions

alloc [AMOUNT] - allocates the amount [AMOUNT] of bytes in memory and places a pointer to this memory onto the stack
realloc [AMOUNT] - reallocates the memory pointed on by the pointer popped from the top of the stack to amount [AMOUNT] (in bytes), placing a pointer on the reallocated memory onto the stack
free - frees the memory pointed on by the pointer on the top of the stack
point - gets a word data from the location pointed on by a pointer popped from the top of the stack
copy_to_stack [N] - pushes [N] (count) words from the memory pointed on by a pointer popped from the stack onto the stack
copy_to_memory [N] - moves [N] (count) words from the memory pointed on by a pointer popped from the stack by popping them from the stack to memory
copy_memory - moves (stack_top) bytes from one location ((stack_top-1)) to another ((stack_top-2)).

Object manipulation instructions

new_object [TYPE] - allocates a new object of type [TYPE], placing a pointer on the allocated object onto the stack
new_array [TYPE] [SIZE] - allocates a new array for [SIZE] objects of type [TYPE], placing a pointer on the allocated array onto the stack
free_object - frees the object pointed on by the pointer on the top of the stack
free_array - frees the array pointed on by the pointer on the top of the stack

get_field [TYPE] [FIELD] - places the value of field [FIELD] of type [TYPE] of the object pointed on by the element on the top of the stack
set_field [TYPE] [FIELD] - places the value on the top of the stack to field [FIELD] of type [TYPE] of the object pointed on by the element below the stack´s top
call_static [TYPE] [FUNCTION] - calls the function [FUNCTION] of type [TYPE] with arguments on the top of the stack, creating a new stack frame
call_dynamic [TYPE] [FUNCTION] - calls the function [FUNCTION] of type [TYPE] of the object pointed on by the element on the stack´s top with arguments being on the top of the stack, creating a new stack frame
call_indirect - calls the instruction pointed on by the top of the stack

get_element - places the value of n-th element in the array of objects pointed on by the element below the stack´s top, using the value of the element on the stack´s top as n

addressof_field [TYPE] [FIELD] - places the address of field [FIELD] of type [TYPE] of the object pointed on by the element on the stack´s top onto the top of the stack
addressof_function [TYPE] [FUNCTION] - places the address of function [FUNCTION] of type [TYPE] of the object pointed on by the element on the stack´s top onto the top of the stack
addressof_element - places the address of n-th element in the array of objects pointed on by the element below the stack´s top, using the value of the element on the stack´s top as n

is_type [TYPE] - pops a pointer to an object from the top of the stack and compares its type with [TYPE]. The result of this comparison is placed on the stack.
check_type [TYPE] - pops a pointer to an object from the top of the stack and compares its type with [TYPE]. If type(obj) != [TYPE], CastException is thrown.

Exception instructions

throw - pops a pointer to an exception object from the stack and throws it as exception

Meta instructions

alloc_resource [RESOURCE] - loads the resource [RESOURCE] to the memory and places the memory address onto the stack
free_resource - frees the resource pointed on by the pointer on the stack´s top and removes the pointer from the stack

Binary format

Albatross Bytecode is stored in rather simple binary format organized in the similar fashion as it is in its overall structure.

The binary format uses only two terms for specifying size: BYTE or byte, and WORD or word. Byte is always a chunk of 8 bits, while a word must be a chunk of at least 32 bits, but it can be more. The size of a word is specified by the constant pool.

On the beginning there was a namespace

Purely-namespace namespaces (or now just "namespaces" for the simplicity) are stored in the file system as directories or folders. Each of the namespace folders contains file system entries representing other namespaces or classes.

Classes

Class data is stored in a file without and specific extension. The format of a class binary consists of the following parts:

class magic - a word-sized magic number
constant pool - containing constant class data accessible by indices
class header - containing basic information about the class
namespace imports - containing a list of namespaces that must be loaded with the class
field list - containing a list of elements describing the fields
function list - containing a list of elements declaring and defining functions

Class magic

Class magic is a 32-bit value 0xBADEDADE or any of its repetitions on case of more than 32-bit word.

Constant pool

Begins with the number of entries in index list and size of all data located in the constant pool.

``

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Albatross Bytecode Specification.md

Albatross Bytecode Specification.md

Albatross Bytecode Specification

General description

Structure of this document

Overall structure

Namespace

Class

Fields

Functions

Inheritance

Resource

Bytecode execution

Stack instructions

Constant loading instructions

Arithmetic instructions

Bitwise instructions

Comparison instructions

Branching instructions

Memory manipulation instructions

Object manipulation instructions

Exception instructions

Meta instructions

Binary format

On the beginning there was a namespace

Classes

Class magic

Constant pool

Files

Albatross Bytecode Specification.md

Latest commit

History

Albatross Bytecode Specification.md

File metadata and controls

Albatross Bytecode Specification

General description

Structure of this document

Overall structure

Namespace

Class

Fields

Functions

Inheritance

Resource

Bytecode execution

Stack instructions

Constant loading instructions

Arithmetic instructions

Bitwise instructions

Comparison instructions

Branching instructions

Memory manipulation instructions

Object manipulation instructions

Exception instructions

Meta instructions

Binary format

On the beginning there was a namespace

Classes

Class magic

Constant pool