Skip to content

Latest commit

 

History

History
63 lines (41 loc) · 2.42 KB

hadoop-catalog.md

File metadata and controls

63 lines (41 loc) · 2.42 KB
title slug date keyword license
Hadoop catalog
/hadoop-catalog
2024-04-02
hadoop catalog
Copyright 2024 Datastrato Pvt Ltd. This software is licensed under the Apache License version 2.

Introduction

Hadoop catalog is a fileset catalog that using Hadoop Compatible File System (HCFS) to manage the storage location of the fileset. Currently, it supports local filesystem and HDFS. For object storage like S3, GCS, and Azure Blob Storage, you can put the hadoop object store jar like hadoop-aws into the $GRAVITINO_HOME/catalogs/hadoop/libs directory to enable the support. Gravitino itself hasn't yet tested the object storage support, so if you have any issue, please create an issue.

Note that Gravitino uses Hadoop 3 dependencies to build Hadoop catalog. Theoretically, it should be compatible with both Hadoop 2.x and 3.x, since Gravitino doesn't leverage any new features in Hadoop 3. If there's any compatibility issue, please create an issue.

Catalog

Catalog properties

Property Name Description Default Value Required Since Version
location The storage location managed by Hadoop catalog. (none) No 0.5.0

Catalog operations

Refer to Catalog operations for more details.

Schema

Schema capabilities

The Hadoop catalog supports creating, updating, deleting, and listing schema.

Schema properties

Property name Description Default value Required Since Version
location The storage location managed by Hadoop schema. (none) No 0.5.0

Schema operations

Refer to Schema operation for more details.

Fileset

Fileset capabilities

  • The Hadoop catalog supports creating, updating, deleting, and listing filesets.

Fileset properties

None.

Fileset operations

Refer to Fileset operations for more details.