Skip to content

Commit

Permalink
Worked on LevelDB database format support
Browse files Browse the repository at this point in the history
  • Loading branch information
joachimmetz committed Jan 2, 2024
1 parent ff454aa commit d7aa0dc
Show file tree
Hide file tree
Showing 5 changed files with 303 additions and 144 deletions.
199 changes: 127 additions & 72 deletions documentation/LevelDB database format.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -76,194 +76,249 @@ For example:
00000000 4d 41 4e 49 46 45 53 54 2d 30 30 30 30 30 31 0a |MANIFEST-000001.|
....

== Write ahead log file (.ldb)
== Descriptor file

A write ahead log file consists of:
The desriptor file is a <<log_file,write ahead log file>> that consist of:

* one or more 32k pages
** one or more data blocks
* one or more descriptor records

[cols="1,5",options="header"]
|===
| Characteristics | Description
| Byte order | little-endian
| Date and time values |
| Character strings |
|===
=== Descriptor record

=== Log block
A descriptor (VersionEdit) record consists of:

A log block is of variable size and consists of:
* One or more descriptor values

[cols="1,1,1,5",options="header"]
|===
| Offset | Size | Value | Description
| 0 | 4 | | Checksum +
Contains a CRC-32
| 4 | 2 | | Record data size
| 5 | 1 | | Record type +
See: <<log_record_types,log record types>>
| 6 | record data size | | Record data
|===
=== Descriptor value

==== [[log_record_types]]Log record types
A descriptor value consists of:

[cols="1,1,5",options="header"]
|===
| Value | Identifier | Description
| 1 | FULL | Full record
| 2 | FIRST | First segment of record data
| 3 | MIDDLE | Intermediate segment of record data
| 4 | LAST | Last segment of record data
|===

=== Log record

A log record consists of:

* One or more tagged values

Where each tagged values consists of:

* A <<log_value_tags,value tag>>
* A <<descriptor_value_tags,value tag>>
* Value data

==== [[log_value_tags]]Log value tags
==== [[descriptor_value_tags]]Descriptor value tags

[cols="1,1,5",options="header"]
|===
| Value | Identifier | Description
| 1 | kComparator | Comparator +
See: <<log_comparator_value,comparator value>>
See: <<descriptor_comparator_value,comparator value>>
| 2 | kLogNumber | Log number +
See: <<log_log_number_value,log number value>>
See: <<descriptor_log_number_value,log number value>>
| 3 | kNextFileNumber | Next file number +
See: <<log_next_file_number_value,next file number value>>
See: <<descriptor_next_file_number_value,next file number value>>
| 4 | kLastSequence | Last sequence number +
See: <<log_last_sequence_number_value,last sequence number value>>
See: <<descriptor_last_sequence_number_value,last sequence number value>>
| 5 | kCompactPointer | Compact pointer +
See: <<log_compact_pointer_value,compact pointer value>>
See: <<descriptor_compact_pointer_value,compact pointer value>>
| 6 | kDeletedFile | Deleted file +
See: <<log_deleted_file_value,deleted file value>>
See: <<descriptor_deleted_file_value,deleted file value>>
| 7 | kNewFile | New file +
See: <<log_new_file_value,new file value>>
See: <<descriptor_new_file_value,new file value>>
| 8 | | [yellow-background]*Unknown (was used for large value references)*
| 9 | kPrevLogNumber | Previous log number +
See: <<log_previous_log_number_value,previous log number value>>
See: <<descriptor_previous_log_number_value,previous log number value>>
|===

==== [[log_comparator_value]]Comparator value
==== [[descriptor_comparator_value]]Comparator value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 1 | Value tag +
Contains a <<varint64,variable-size integer>> +
See: <<log_value_tags,value tags>>
See: <<descriptor_value_tags,value tags>>
| 1 | ... | | Name string size
| ... | ... | | Name string +
Contains an UTF-8 encoded string without end-of-string character
|===

==== [[log_log_number_value]]Log number value
==== [[descriptor_log_number_value]]Log number value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 2 | Value tag +
Contains a <<varint64,variable-size integer>> +
See: <<log_value_tags,value tags>>
See: <<descriptor_value_tags,value tags>>
| 1 | ... | | Log number +
Contains a <<varint64,variable-size integer>>
|===

==== [[log_next_file_number_value]]Next file number value
==== [[descriptor_next_file_number_value]]Next file number value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 3 | Value tag +
Contains a <<varint64,variable-size integer>> +
See: <<log_value_tags,value tags>>
See: <<descriptor_value_tags,value tags>>
| 1 | ... | | Next file number +
Contains a <<varint64,variable-size integer>>
|===

==== [[log_last_sequence_number_value]]Last sequence number value
==== [[descriptor_last_sequence_number_value]]Last sequence number value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 4 | Value tag +
Contains a <<varint64,variable-size integer>> +
See: <<log_value_tags,value tags>>
See: <<descriptor_value_tags,value tags>>
| 1 | ... | | Last sequence number +
Contains a <<varint64,variable-size integer>>
|===

==== [[log_compact_pointer_value]]Compact pointer value
==== [[descriptor_compact_pointer_value]]Compact pointer value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 5 | Value tag +
Contains a <<varint64,variable-size integer>> +
See: <<log_value_tags,value tags>>
See: <<descriptor_value_tags,value tags>>
| 1 | ... | | Level +
Contains a <<varint64,variable-size integer>>
| ... | ... | | Key +
Contains a <<log_key_value,key value>>
Contains a <<log_slice_value,slice value>>
|===

==== [[log_deleted_file_value]]Deleted file value
==== [[descriptor_deleted_file_value]]Deleted file value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 6 | Value tag +
Contains a <<varint64,variable-size integer>> +
See: <<log_value_tags,value tags>>
See: <<descriptor_value_tags,value tags>>
| 1 | ... | | Level +
Contains a <<varint64,variable-size integer>>
| ... | ... | | Number of files +
Contains a <<varint64,variable-size integer>>
|===

==== [[log_new_file_value]]New file value
==== [[descriptor_new_file_value]]New file value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 7 | Value tag +
Contains a <<varint64,variable-size integer>> +
See: <<log_value_tags,value tags>>
See: <<descriptor_value_tags,value tags>>
| 1 | ... | | Level +
Contains a <<varint64,variable-size integer>>
| ... | ... | | Number of files +
Contains a <<varint64,variable-size integer>>
| ... | ... | | File size +
Contains a <<varint64,variable-size integer>>
| ... | ... | | Smallest record key +
Contains a <<log_key_value,key value>>
Contains a <<log_slice_value,slice value>>
| ... | ... | | Largest record key +
Contains a <<log_key_value,key value>>
Contains a <<log_slice_value,slice value>>
|===

==== [[log_previous_log_number_value]]Previous log number value
==== [[descriptor_previous_log_number_value]]Previous log number value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 9 | Value tag +
Contains a <<varint64,variable-size integer>> +
See: <<log_value_tags,value tags>>
See: <<descriptor_value_tags,value tags>>
| 1 | ... | | Previous log number +
Contains a <<varint64,variable-size integer>>
|===

==== [[log_key_value]]Key value
== [[log_file]]Write ahead log file (.log)

A write ahead log file consists of:

* one or more 32k pages
** one or more log blocks

[cols="1,5",options="header"]
|===
| Characteristics | Description
| Byte order | little-endian
| Date and time values |
| Character strings |
|===

=== Log block

A log block is of variable size and consists of:

[cols="1,1,1,5",options="header"]
|===
| Offset | Size | Value | Description
| 0 | 4 | | Checksum +
Contains a CRC-32
| 4 | 2 | | Record data size
| 5 | 1 | | Record type +
See: <<log_record_types,log record types>>
| 6 | record data size | | Record data
|===

==== [[log_record_types]]Log record types

[cols="1,1,5",options="header"]
|===
| Value | Identifier | Description
| 1 | FULL | Full record
| 2 | FIRST | First segment of record data
| 3 | MIDDLE | Intermediate segment of record data
| 4 | LAST | Last segment of record data
|===

=== Log record

A log (WriteBatch) record consists of:

* value header

==== Log value header

A log header value is 12 byte in size and consists of:

[cols="1,1,1,5",options="header"]
|===
| 0 | 8 | | Sequence number
| 8 | 4 | | [yellow-background]*Unknown (Count?)*
|===

==== [[log_value_types]]Log value types

[cols="1,1,5",options="header"]
|===
| Value | Identifier | Description
| 0 | kTypeDeletion | Deletion +
See: <<log_deletion_value,deletion value>>
| 1 | kTypeValue | Put +
See: <<log_put_value,put value>>
|===

==== [[log_slice_value]]Slice value

[cols="1,1,1,5",options="header"]
|===
| 0 | ... | | Data size
| ... | ... | | Data
|===

==== [[log_deletion_value]]Deletion value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 0 | Value type
| 1 | ... | | Key +
Contains a <<log_slice_value,slice value>>
|===

==== [[log_put_value]]Put value

[cols="1,1,1,5",options="header"]
|===
| 0 | 1 | 1 | Value type
| 1 | ... | | Key +
Contains a <<log_slice_value,slice value>>
| ... | ... | | Value +
Contains a <<log_slice_value,slice value>>
|===

== Sorted tables file (.ldb)

A sorted tables file file consists of:
A sorted tables file consists of:

* one or more data blocks
* one or more metadata blocks
Expand Down
9 changes: 9 additions & 0 deletions dtformats/leveldb.debug.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,15 @@ attributes:
description: "Record data"
format: binary_data
---
data_type_map: leveldb_log_value_header
attributes:
- name: sequence_number
description: "Sequence number"
format: decimal
- name: count
description: "Count"
format: decimal
---
data_type_map: leveldb_table_footer
attributes:
- name: metaindex_block_offset
Expand Down
Loading

0 comments on commit d7aa0dc

Please sign in to comment.