Skip to content

Commit

Permalink
Improve Group Support (TileDB-Inc#2966)
Browse files Browse the repository at this point in the history
Improve Group Support

This PR is the first (and likely largest) in a series of PRs that will
improve the TileD Group Support.

This PR focuses on adding a new on-disk format in which we explicitly
keep track of members of a group. New c-apis/cpp are added to support
this. Groups now can be opened in read and write mode. Arrays or groups
can be added as members via `tiledb_group_add_member` and
`tiledb_group_remove_member`.

Metadata on groups is also now supported. It uses the same basic
functionality as arrays to support groups.

Co-authored-by: bdeng-xt <[email protected]>
Co-authored-by: Isaiah Norton <[email protected]>
  • Loading branch information
3 people committed Apr 11, 2022
1 parent e253bc2 commit 64abb14
Show file tree
Hide file tree
Showing 65 changed files with 15,257 additions and 5,134 deletions.
167 changes: 167 additions & 0 deletions examples/c_api/groups.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
/**
* @file groups.c
*
* @section LICENSE
*
* The MIT License
*
* @copyright Copyright (c) 2022 TileDB, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
* @section DESCRIPTION
*
* This program creates a hierarchy as shown below. Specifically, it creates
* groups `my_group` and `sparse_arrays`, and
* then some dense/sparse arrays.
*
* my_group/
* ├── dense_arrays
* │   ├── array_A
* │   └── array_B
* └── sparse_arrays
* ├── array_C
* └── array_D
*
* The program then shows how to group these together using the TileDB Group API
*/

#include <stdio.h>
#include <stdlib.h>
#include <tiledb/tiledb.h>
#include <tiledb/tiledb_experimental.h>

void create_array(const char* array_name, tiledb_array_type_t type) {
// Create TileDB context
tiledb_ctx_t* ctx;
tiledb_ctx_alloc(NULL, &ctx);

// The array will be 4x4 with dimensions "rows" and "cols", with domain [1,4].
int dim_domain[] = {1, 4, 1, 4};
int tile_extents[] = {4, 4};
tiledb_dimension_t* d1;
tiledb_dimension_alloc(
ctx, "rows", TILEDB_INT32, &dim_domain[0], &tile_extents[0], &d1);
tiledb_dimension_t* d2;
tiledb_dimension_alloc(
ctx, "cols", TILEDB_INT32, &dim_domain[2], &tile_extents[1], &d2);

// Create domain
tiledb_domain_t* domain;
tiledb_domain_alloc(ctx, &domain);
tiledb_domain_add_dimension(ctx, domain, d1);
tiledb_domain_add_dimension(ctx, domain, d2);

// Create a single attribute "a" so each (i,j) cell can store an integer
tiledb_attribute_t* a;
tiledb_attribute_alloc(ctx, "a", TILEDB_INT32, &a);

// Create array schema
tiledb_array_schema_t* array_schema;
tiledb_array_schema_alloc(ctx, type, &array_schema);
tiledb_array_schema_set_cell_order(ctx, array_schema, TILEDB_ROW_MAJOR);
tiledb_array_schema_set_tile_order(ctx, array_schema, TILEDB_ROW_MAJOR);
tiledb_array_schema_set_domain(ctx, array_schema, domain);
tiledb_array_schema_add_attribute(ctx, array_schema, a);

// Create array
tiledb_array_create(ctx, array_name, array_schema);

// Clean up
tiledb_attribute_free(&a);
tiledb_dimension_free(&d1);
tiledb_dimension_free(&d2);
tiledb_domain_free(&domain);
tiledb_array_schema_free(&array_schema);
tiledb_ctx_free(&ctx);
}

void create_arrays_groups() {
// Create context
tiledb_ctx_t* ctx;
tiledb_ctx_alloc(NULL, &ctx);

// Create groups
tiledb_group_create(ctx, "my_group");
tiledb_group_create(ctx, "my_group/sparse_arrays");

// Create dense_arrays folder
tiledb_vfs_t* vfs;
tiledb_vfs_alloc(ctx, NULL, &vfs);
tiledb_vfs_create_dir(ctx, vfs, "my_group/dense_arrays");
tiledb_vfs_free(&vfs);

// Create arrays
create_array("my_group/dense_arrays/array_A", TILEDB_DENSE);
create_array("my_group/dense_arrays/array_B", TILEDB_DENSE);
create_array("my_group/sparse_arrays/array_C", TILEDB_SPARSE);
create_array("my_group/sparse_arrays/array_D", TILEDB_SPARSE);

// Add members to groups
tiledb_group_t* my_group;
tiledb_group_t* sparse_arrays_group;

tiledb_group_alloc(ctx, "my_group", &my_group);
tiledb_group_open(ctx, my_group, TILEDB_WRITE);

tiledb_group_add_member(ctx, my_group, "my_group/dense_arrays/array_A", 1);
tiledb_group_add_member(ctx, my_group, "my_group/dense_arrays/array_B", 1);
tiledb_group_add_member(ctx, my_group, "my_group/sparse_arrays", 1);

tiledb_group_alloc(ctx, "my_group/sparse_arrays", &sparse_arrays_group);
tiledb_group_open(ctx, sparse_arrays_group, TILEDB_WRITE);
tiledb_group_add_member(
ctx, sparse_arrays_group, "my_group/sparse_arrays/array_C", 1);
tiledb_group_add_member(
ctx, sparse_arrays_group, "my_group/sparse_arrays/array_C", 1);

tiledb_group_close(ctx, my_group);
tiledb_group_close(ctx, sparse_arrays_group);

// Clean up
tiledb_group_free(&my_group);
tiledb_group_free(&sparse_arrays_group);
tiledb_ctx_free(&ctx);
}

void print_group() {
// Create context
tiledb_ctx_t* ctx;
tiledb_ctx_alloc(NULL, &ctx);
tiledb_group_t* my_group;

tiledb_group_alloc(ctx, "my_group", &my_group);
tiledb_group_open(ctx, my_group, TILEDB_WRITE);

char* str;
tiledb_group_dump_str(ctx, my_group, &str, 1);

printf("%s\n", str);

free(str);
tiledb_group_close(ctx, my_group);
tiledb_group_free(&my_group);
}

int main() {
create_arrays_groups();
print_group();

return 0;
}
101 changes: 101 additions & 0 deletions examples/cpp_api/groups.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
/**
* @file groups.cc
*
* @section LICENSE
*
* The MIT License
*
* @copyright Copyright (c) 2022 TileDB, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
* @section DESCRIPTION
*
* This program creates a hierarchy as shown below. Specifically, it creates
* groups `my_group` and `sparse_arrays`, and
* then some dense/sparse arrays.
*
* my_group/
* ├── dense_arrays
* │   ├── array_A
* │   └── array_B
* └── sparse_arrays
* ├── array_C
* └── array_D
*
* The program then shows how to group these together using the TileDB Group API
*/

#include <iostream>
#include <tiledb/tiledb>
#include <tiledb/tiledb_experimental>

using namespace tiledb;

void create_array(const std::string& array_name, tiledb_array_type_t type) {
Context ctx;
if (Object::object(ctx, array_name).type() == Object::Type::Array)
return;

Domain domain(ctx);
domain.add_dimension(Dimension::create<int>(ctx, "rows", {{1, 4}}, 4))
.add_dimension(Dimension::create<int>(ctx, "cols", {{1, 4}}, 4));
ArraySchema schema(ctx, type);
schema.set_domain(domain).set_order({{TILEDB_ROW_MAJOR, TILEDB_ROW_MAJOR}});
schema.add_attribute(Attribute::create<int>(ctx, "a"));
Array::create(array_name, schema);
}

void create_arrays_groups() {
// Create groups
tiledb::Context ctx;
tiledb::create_group(ctx, "my_group");
tiledb::create_group(ctx, "my_group/sparse_arrays");

tiledb::VFS vfs(ctx);
vfs.create_dir("my_group/dense_arrays");

// Create arrays
create_array("my_group/dense_arrays/array_A", TILEDB_DENSE);
create_array("my_group/dense_arrays/array_B", TILEDB_DENSE);
create_array("my_group/sparse_arrays/array_C", TILEDB_SPARSE);
create_array("my_group/sparse_arrays/array_D", TILEDB_SPARSE);

tiledb::Group group(ctx, "my_group", TILEDB_WRITE);
group.add_member("my_group/dense_arrays/array_A", true);
group.add_member("my_group/dense_arrays/array_B", true);
group.add_member("my_group/sparse_arrays", true);

tiledb::Group group_sparse(ctx, "my_group/sparse_arrays", TILEDB_WRITE);
group_sparse.add_member("my_group/sparse_arrays/array_C", true);
group_sparse.add_member("my_group/sparse_arrays/array_D", true);
}

void print_group() {
tiledb::Context ctx;
tiledb::Group group(ctx, "my_group", TILEDB_READ);
std::cout << group.dump(true) << std::endl;
}

int main() {
create_arrays_groups();
print_group();

return 0;
}
2 changes: 1 addition & 1 deletion format_spec/FORMAT_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ title: Format Specification
* [File hierarchy](./array_file_hierarchy.md)
* [Array Schema](./array_schema.md)
* [Fragment](./fragment.md)
* [Array Metadata](./array_metadata.md)
* [Array Metadata](./metadata.md)
* [Tile](./tile.md)
* [Generic Tile](./generic_tile.md)
* **Group**
Expand Down
2 changes: 1 addition & 1 deletion format_spec/array_file_hierarchy.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ Inside the array folder, you can find the following:
* Inside the same commit folder, any number of [consolidated commits files](./consolidated_commits_file.md) of the form `<timestamped_name>.con`.
* Inside the same commit folder, any number of [ignore files](./ignore_file.md) of the form `<timestamped_name>.ign`.
* Inside of a fragment metadata folder, any number of [consolidated fragment metadata files](./consolidated_fragment_metadata_file.md) of the form `<timestamped_name>.meta`.
* [Array metadata](./array_metadata.md) folder `__meta`.
* [Array metadata](./metadata.md) folder `__meta`.
35 changes: 35 additions & 0 deletions format_spec/group.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Group

A group consists of [metadata](./metadata.md) and a file containing group members

```
my_group # Group folder
|_ __tiledb_group.tdb # Empty group file
|_ __group # Group folder
|_ <timestamped_name> # Timestamped group file detailing members
|_ __meta # group metadata folder
```

## Group File


| **Field** | **Type** | **Description** |
| :--- | :--- | :--- |
| Version | `uint32_t` | Format version number of the group |
| Number of Group Member | `uint32_t` | The number of group members. |
| Group Member 1 | [Group Member](##Group Member) | First group member |
||||
| Group Member N | [Group Member](##Group Member) | Nth group member |


## Group Member

The group member is the content inside a [group](./group.md)

| **Field** | **Type** | **Description** |
| :--- | :--- | :--- |
| Version | `uint32_t` | Format version number of the group member |
| Object type | `uint8_t` | Object type of the member |
| Relative | `uint8_t` | Is the URI relative to the group |
| URI length | `uint32_t` | Number of characters in uri |
| URI | `char[]` | URI character array |
8 changes: 8 additions & 0 deletions format_spec/group_file_hierarchy.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ A TileDB group is a folder with a single file in it:
```
my_group # Group folder
|_ __tiledb_group.tdb # Empty group file
|_ __group # Group folder
|_ <timestamped_name> # Timestamped group file detailing members
|_ __meta # group metadata folder
```

File `__tiledb_group.tdb` is empty and it is merely used to indicate that `my_group` is a TileDB group.

Inside the group folder, you can find the following:

* [Group details](./group.md) folder `__group`.
* [Group metadata](./metadata.md) folder `__meta`.
19 changes: 9 additions & 10 deletions format_spec/array_metadata.md → format_spec/metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,16 @@ title: Array Metadata

## Main Structure

The array metadata is a folder called `__meta` located here:
The metadata is a folder called `__meta` located here:

```
my_array # array folder
| ...
|_ __meta # array metadata folder
|_ <timestamped_name> # array metadata file
|_ ...
|_ __meta # metadata folder
|_ <timestamped_name> # metadata file
|_ ...
|_ <timestamped_name>.vac # vacuum file
|_ ...
|_ ...
```

`<timestamped_name>` has format `__t1_t2_uuid_v`, where:
Expand All @@ -22,14 +22,13 @@ my_array # array folder
* `uuid` is a unique identifier
* `v` is the format version

The array metadata folder can contain:

* Any number of [array metadata files](#array-metadata-file)
The metadata folder can contain:
* Any number of [metadata files](#array-metadata-file)
* Any number of [vacuum files](./vacuum_file.md)

## Array Metadata File
## Metadata File

The array metadata file has the following on-disk format:
The metadata file has the following on-disk format:

| **Field** | **Type** | **Description** |
| :--- | :--- | :--- |
Expand Down
Loading

0 comments on commit 64abb14

Please sign in to comment.