diff --git a/docs/sql-manual/sql-functions/string-functions/auto_partition_name.md b/docs/sql-manual/sql-functions/string-functions/auto-partition-name.md similarity index 99% rename from docs/sql-manual/sql-functions/string-functions/auto_partition_name.md rename to docs/sql-manual/sql-functions/string-functions/auto-partition-name.md index c9b1dd0421dc5..3d9cf5569fce3 100644 --- a/docs/sql-manual/sql-functions/string-functions/auto_partition_name.md +++ b/docs/sql-manual/sql-functions/string-functions/auto-partition-name.md @@ -25,7 +25,7 @@ under the License. --> ## auto_partition_name -### description +### Description #### Syntax `VARCHAR AUTO_PARTITION_NAME('RANGE', 'VARCHAR unit', DATETIME datetime)` @@ -40,9 +40,9 @@ The datetime parameter is a legal date expression. The unit parameter is the time interval you want, the available values are: [`second`, `minute`, `hour`, `day`, `month`, `year`]. If unit does not match one of these options, a syntax error will be returned. -### example -``` +### Example +```sql mysql> select auto_partition_name('range', 'years', '123'); ERROR 1105 (HY000): errCode = 2, detailMessage = range auto_partition_name must accept year|month|day|hour|minute|second for 2nd argument @@ -108,7 +108,6 @@ mysql> select auto_partition_name('list', "你好"); +------------------------------------+ | p4f60597d2 | +------------------------------------+ - ``` ### keywords diff --git a/docs/sql-manual/sql-functions/table-valued-functions/partitions.md b/docs/sql-manual/sql-functions/table-valued-functions/partitions.md index 7bda80d77e298..b5ffa054b6c0d 100644 --- a/docs/sql-manual/sql-functions/table-valued-functions/partitions.md +++ b/docs/sql-manual/sql-functions/table-valued-functions/partitions.md @@ -36,8 +36,6 @@ The table function generates a temporary partition TABLE, which allows you to vi This function is used in the from clause. -This function is supported since 2.1.5 - #### Syntax `partitions("catalog"="","database"="","table"="")` diff --git a/docs/table-design/data-partitioning/auto-partitioning.md b/docs/table-design/data-partitioning/auto-partitioning.md index 91a9da4b39a42..8420601713892 100644 --- a/docs/table-design/data-partitioning/auto-partitioning.md +++ b/docs/table-design/data-partitioning/auto-partitioning.md @@ -1,6 +1,6 @@ --- { - "title": "Auto partitioning", + "title": "Auto Partition", "language": "en" } --- @@ -27,11 +27,11 @@ under the License. ## Application scenario -The Auto Partitioning feature supports automatic detection of whether the corresponding partition exists during the data import process. If it does not exist, the partition will be created automatically and imported normally. +The Auto Partition feature supports automatic detection of whether the corresponding partition exists during the data import process. If it does not exist, the partition will be created automatically and imported normally. The auto partition function mainly solves the problem that the user expects to partition the table based on a certain column, but the data distribution of the column is scattered or unpredictable, so it is difficult to accurately create the required partitions when building or adjusting the structure of the table, or the number of partitions is so large that it is too cumbersome to create them manually. -Take the time type partition column as an example, in dynamic partitioning, we support the automatic creation of new partitions to accommodate real-time data at specific time periods. For real-time user behavior logs and other scenarios, this feature basically meets the requirements. However, in more complex scenarios, such as dealing with non-real-time data, the partition column is independent of the current system time and contains a large number of discrete values. At this time, to improve efficiency we want to partition the data based on this column, but the data may actually involve the partition can not be grasped in advance, or the expected number of required partitions is too large. In this case, dynamic partitioning or manually created partitions cannot meet our needs, while auto partitioning covers such needs. +Take the time type partition column as an example, in dynamic partitioning, we support the automatic creation of new partitions to accommodate real-time data at specific time periods. For real-time user behavior logs and other scenarios, this feature basically meets the requirements. However, in more complex scenarios, such as dealing with non-real-time data, the partition column is independent of the current system time and contains a large number of discrete values. At this time, to improve efficiency we want to partition the data based on this column, but the data may actually involve the partition can not be grasped in advance, or the expected number of required partitions is too large. In this case, dynamic partitioning or manually created partitions cannot meet our needs, while Auto Partition covers such needs. Suppose the table DDL is as follows: @@ -74,62 +74,52 @@ PROPERTIES ( ); ``` -The table stores a large amount of business history data, partitioned based on the date the transaction occurred. As you can see when building the table, we need to manually create the partitions in advance. If the data range of the partitioned columns changes, for example, 2022 is added to the above table, we need to create a partition by [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION) to make changes to the table partition. If such partitions need to be changed, or subdivided at a finer level of granularity, it is very tedious to modify them. At this point we can rewrite the table DDL using auto partitioning. +The table stores a large amount of business history data, partitioned based on the date the transaction occurred. As you can see when building the table, we need to manually create the partitions in advance. If the data range of the partitioned columns changes, for example, 2022 is added to the above table, we need to create a partition by [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION) to make changes to the table partition. If such partitions need to be changed, or subdivided at a finer level of granularity, it is very tedious to modify them. At this point we can rewrite the table DDL using Auto Partition. ## Syntax -When creating a table, use the following syntax to populate the `partition_info` section in the `CREATE-TABLE`statement: +When creating a table, use the following syntax to populate the `partition_info` section in the `CREATE-TABLE` statement: -- For RANGE partitioning: +- For RANGE Partition: - ```sql - AUTO PARTITION BY RANGE (FUNC_CALL_EXPR) - ( - ) - ``` - - Where, +```sql + AUTO PARTITION BY RANGE (FUNC_CALL_EXPR) + () +``` - ```sql +Where +```sql FUNC_CALL_EXPR ::= date_trunc ( , '' ) - ``` - -:::info Note - -In Apache Doris 2.1.0 version, `FUNC_CALL_EXPR` needs not to be enclosed in parentheses. - -::: - -- For LIST partitioning: +``` - ```sql - AUTO PARTITION BY LIST(`partition_col`) - ( - ) - ``` +- For LIST Partition: -**Sample** +```sql + AUTO PARTITION BY LIST(`partition_col1`[, `partition_col2`, ...]) + () +``` +### Sample -- For RANGE partitioning: +- For Range Partition: - ```sql - CREATE TABLE `date_table` ( - `TIME_STAMP` datev2 NOT NULL COMMENT '采集日期' - ) ENGINE=OLAP - DUPLICATE KEY(`TIME_STAMP`) - AUTO PARTITION BY RANGE (date_trunc(`TIME_STAMP`, 'month')) - ( - ) - DISTRIBUTED BY HASH(`TIME_STAMP`) BUCKETS 10 - PROPERTIES ( - "replication_allocation" = "tag.location.default: 1" - ); - ``` +```sql + CREATE TABLE `date_table` ( + `TIME_STAMP` datev2 NOT NULL COMMENT '采集日期' + ) ENGINE=OLAP + DUPLICATE KEY(`TIME_STAMP`) + AUTO PARTITION BY RANGE (date_trunc(`TIME_STAMP`, 'month')) + ( + ) + DISTRIBUTED BY HASH(`TIME_STAMP`) BUCKETS 10 + PROPERTIES ( + "replication_allocation" = "tag.location.default: 1" + ); +``` -- For LIST partitioning: +- For List Partition: - ```sql +```sql CREATE TABLE `str_table` ( `str` varchar not null ) ENGINE=OLAP @@ -141,22 +131,22 @@ In Apache Doris 2.1.0 version, `FUNC_CALL_EXPR` needs not to be enclosed in pare PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - ``` +``` -**Constraints** +List Auto Partition supports multiple partition columns, which are written in the same way as normal List Partition: ```AUTO PARTITION BY LIST (`col1`, `col2`, ...)``` -- In auto LIST partitioning, the partition name length **must** **not exceed 50 characters**. This length is derived from the concatenation and escape of contents of partition columns on corresponding data rows, so the actual allowed length may be shorter. -- In auto RANGE partitioning, the partition function only supports `date_trunc`, and the partition column supports only `DATE` or `DATETIME` types. -- In auto LIST partitioning, function calls are not supported, and the partition column supports `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` data types, with partition values being enumeration values. -- In auto LIST partitioning, for every existing value in the partition column that does not correspond to a partition, a new independent partitioning will be created. +### Constraints -**NULL value partitioning** +- In auto List Partition, the partition name length **must not exceed 50 characters**. This length is derived from the concatenation and escape of contents of partition columns on corresponding data rows, so the actual allowed length may be shorter. +- In auto Range Partition, the partition function only supports `date_trunc`, and the partition column supports only `DATE` or `DATETIME` types. +- In auto List Partition, function calls are not supported, and the partition column supports `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` data types, with partition values being enumeration values. +- In auto List Partition, for every existing value in the partition column that does not correspond to a partition, a new independent partition will be created. -When the session variable `allow_partition_column_nullable` is enabled, LIST and RANGE partitioning support null columns as partition columns. +### NULL value partition -When an actual insertion encounters a null value in the partition column: +When the session variable `allow_partition_column_nullable` is enabled: -- For auto LIST partitioning, the corresponding NULL value partition will be created automatically: +- For Auto List Partition, the corresponding NULL value partition will be created automatically: ```sql mysql> create table auto_null_list( -> k0 varchar null @@ -188,7 +178,7 @@ mysql> select * from auto_null_list partition(pX); 1 row in set (0.20 sec) ``` -- For auto LIST partitioning, **null columns are not supported to be partition columns**. +- For Auto Range Partition, **null columns are not supported to be partition columns**. ```sql mysql> CREATE TABLE `range_table_nullable` ( -> `k1` INT, @@ -208,7 +198,7 @@ ERROR 1105 (HY000): errCode = 2, detailMessage = AUTO RANGE PARTITION doesn't su ## Example -When using auto partitioning, the example in the Application scenarios section can be rewritten as: +When using Auto Partition, the example in the Application scenarios section can be rewritten as: ```sql CREATE TABLE `DAILY_TRADE_VALUE` @@ -227,7 +217,7 @@ PROPERTIES ( ); ``` -At this point, the new table has no default partitions: +Take the example of a table with only two columns, at which point the new table has no default partitions: ```sql mysql> show partitions from `DAILY_TRADE_VALUE`; @@ -251,23 +241,68 @@ mysql> show partitions from `DAILY_TRADE_VALUE`; 3 rows in set (0.12 sec) ``` -It can be concluded that the partitions created by auto partitioning share the same functionality as partitions created by manual partitioning. +It can be concluded that the partitions created by Auto Partition share the same functionality as partitions created by manual partitioning. + +## Conjunct with Dynamic Partition -## Conjunct with dynamic partitioning +Doris supports both Auto and Dynamic Partition. In this case, both functions are in effect: +1. Auto Partition will automatically create partitions on demand during data import; +2. Dynamic Partition will automatically create, recycle and dump partitions. -In order to maintain a clear partitioning logic, Apache Doris prohibits the simultaneous use of auto partitioning and dynamic partitioning on a single table, as this usage can easily lead to misuse. It is recommended to replace this with the standalone Auto Partitioning feature. +There is no conflict between the two syntaxes, just set the corresponding clauses/attributes at the same time. -:::info Note -In some early versions of Doris 2.1, this functionality was not prohibited but not recommended. -::: +### Best Practice + +In scenarios where you need to set a limit on the partition lifecycle, you can **disable the creation of Dynamic Partition, leaving the creation of partitions to be completed by Auto Partition**, and complete the management of the partition lifecycle through the Dynamic Partition's function of dynamically reclaiming partitions: + +```sql +create table auto_dynamic( + k0 datetime(6) NOT NULL +) +auto partition by range (date_trunc(k0, 'year')) +( +) +DISTRIBUTED BY HASH(`k0`) BUCKETS 2 +properties( + "dynamic_partition.enable" = "true", + "dynamic_partition.prefix" = "p", + "dynamic_partition.start" = "-50", + "dynamic_partition.end" = "0", --- Dynamic Partition No Partition Creation + "dynamic_partition.time_unit" = "year", + "replication_num" = "1" +); +``` + +This way we have both the flexibility of Auto Partition and consistency in partition names. + +## Partition Management + +When Auto Partition is enabled, partition names can be mapped to partitions using the `auto_partition_name` function.The `partitions` table function generates detailed partition information from partition names. Let's take the `DAILY_TRADE_VALUE` table as an example to see its current partition after we insert data: + +```sql +mysql> select * from partitions("catalog"="internal","database"="optest","table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.18 sec) +``` + +In this way the IDs and values of each partition can be precisely filtered for subsequent partition-specific operations (e.g. `insert overwrite partition`). + +For a detailed grammar description, see: [auto_partition_name](../../sql-manual/sql-functions/string-functions/auto-partition-name),[partitions](../../sql-manual/sql-functions/table-valued-functions/partitions)。 ## Key points -- Similar to regular partitioned tables, aoto LIST partitioning supports multi-column partitioning with no syntax differences. +- Similar to regular partitioned tables, aoto List Partition supports multi-column partitioning with no syntax differences. - If partitions are created during data insertion or import processes, and the entire import process is not completed (fails or is canceled), the created partitions will not be automatically deleted. -- Tables using auto partitioning only differ in the method of partition creation, switching from manual to automatic. The original usage of the table and its created partitions remains the same as non-auto partitioning tables or partitions. -- To prevent the accidental creation of too many partitions, Apache Doris controls the maximum number of partitions an auto partitioning table can accommodate through the `max_auto_partition_num setting` in the FE configuration. This value can be adjusted if needed. -- When importing data into a table with auto partitioning enabled, the coordinator sends data with a polling interval different from regular tables. Refer to `olap_table_sink_send_interval_auto_partition_factor` in [BE Configuration](../../admin-manual/config/be-config.md) for details. This setting does not have an impact after `enable_memtable_on_sink_node` is enabled. -- During data insertion using `INSERT-OVERWRITE`, if a specific partition for override is specified, the auto partitioning table behaves like a regular table during this process and does not create new partitions. +- Tables using Auto Partition only differ in the method of partition creation, switching from manual to automatic. The original usage of the table and its created partitions remains the same as non-Auto Partition tables or partitions. +- To prevent the accidental creation of too many partitions, Apache Doris controls the maximum number of partitions an Auto Partition table can accommodate through the `max_auto_partition_num setting` in the FE configuration. This value can be adjusted if needed. +- When importing data into a table with Auto Partition enabled, the coordinator sends data with a polling interval different from regular tables. Refer to `olap_table_sink_send_interval_auto_partition_factor` in [BE Configuration](../../admin-manual/config/be-config) for details. This setting does not have an impact after `enable_memtable_on_sink_node` is enabled. +- When use [insert-overwrite](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE) to load data into Auto Partition table, the behaviour is detailed in the INSERT OVERWRITE documentation. - If metadata operations are involved when importing and creating partitions, the import process may fail. +## Keywords + +AUTO, PARTITION, AUTO_PARTITION diff --git a/docs/table-design/data-partitioning/basic-concepts.md b/docs/table-design/data-partitioning/basic-concepts.md index 8f518dea10d3d..49d7e9720b20f 100644 --- a/docs/table-design/data-partitioning/basic-concepts.md +++ b/docs/table-design/data-partitioning/basic-concepts.md @@ -24,6 +24,8 @@ specific language governing permissions and limitations under the License. --> +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; This document mainly introduces table creation and data partitioning in Doris, as well as potential problems and solutions encountered during table creation operations. @@ -81,7 +83,7 @@ The following code sample introduces how to create tables in Apache Doris by RAN -- Range Partition CREATE TABLE IF NOT EXISTS example_range_tbl ( - `user_id` LARGEINT NOT NULL COMMENT "User ID", + `user_id` LARGEINT NOT NULL COMMENT "User ID", `date` DATE NOT NULL COMMENT "Date when the data are imported", `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", `city` VARCHAR(20) COMMENT "User location city", @@ -115,9 +117,174 @@ The default type of `ENGINE` is `OLAP`. Only OLAP is responsible for data manage `IF NOT EXISTS` indicates that if the table has not been created before, it will be created. Note that this only checks if the table name exists and does not check if the schema of the new table is the same as the schema of an existing table. Therefore, if there is a table with the same name but a different schema, this command will also return successfully, but it does not mean that a new table with a new schema has been created. +### Advanced Features and Examples + +Doris supports advanced data partitioning methods, including Dynamic Partition, Auto Partition, and Auto Bucket, which enable more flexible data management. The following are examples of implementations: + + + +

+ +[Auto Partition](./auto-partitioning) supports automatic creation of corresponding partitions according to user-defined rules during data import, which is more convenient. Rewrite the above example with Auto Range Partition as follows: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Using months as partition granularity +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1" +); +``` + +As above, when the data is imported, Doris will automatically create the corresponding partitions as `date` with a granularity of month level. `2018-12-01` and `2018-12-31` will fall into the same partition, while `2018-11-12` will fall into the leading partition. Auto Partition also supports List partition, please check Auto Partition's documentation for more usage. + +

+
+ + +

+ +[Dynamic Partition](./dynamic-partitioning) is an automatic partition creation and recovery management method based on the real time, rewrite the above example with dynamic partitioning as follows: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "WEEK", --- Partition granularity is week + "dynamic_partition.start" = "-2", --- Retain two weeks forward + "dynamic_partition.end" = "2", --- Two weeks after creation in advance + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +Dynamic partition supports tiered storage, customised copy number and other features, see the dynamic partition documentation for details. + +

+
+ + +

+ +Auto Partition and Dynamic Partition each have their own advantages, and combining the two enables flexible on-demand partition creation and automatic reclamation: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Using months as partition granularity +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "month", --- Both must have the same granularity + "dynamic_partition.start" = "-2" --- Dynamic Partition automatically cleans up partitions that are more than two weeks old + "dynamic_partition.end" = "0", --- Dynamic Partition does not create future partitions. it is left entirely to Auto Partition. + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +For detailed suggestions on this feature, see [Auto Partition Conjunct with Dynamic Partition](./auto-partitioning#conjunct-with-dynamic-partition)。 + +

+
+ + +

+ +When the user is not sure of a reasonable number of buckets, [Auto Bucket](./auto-bucket) for Doris to complete the estimation, and the user only needs to provide the estimated amount of table data: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +( + PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), + PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), + PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), + PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) +) +DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO +PROPERTIES +( + "replication_num" = "1", + "estimate_partition_size" = "2G" --- User estimate of the amount of data a partition will have, defaults to 10G if not provided +); +``` + +Note that this approach does not apply to cases where the amount of table data is particularly large. + +

+
+ +
+ ## View partitions -View the partiton information of a table by running the `show create table` command. +View the partiton information of a table by running the `show create table` command. ```sql > show create table example_range_tbl @@ -154,7 +321,7 @@ View the partiton information of a table by running the `show create table` com +-------------------+---------------------------------------------------------------------------------------------------------+ ``` -Or run the `show partitions from your_table` command. +Or run the `show partitions from your_table` command. ```sql > show partitions from example_range_tbl @@ -179,8 +346,39 @@ Or run the `show partitions from your_table` command. You can add a new partition by running the `alter table add partition ` command. ```sql -ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; +ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; ``` For more information about how to alter partitions, refer to [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION.md). +## Partition Retrieval + +The `partitions` table function and the `information_schema.partitions` system table record partition information for the cluster. The partition information can be extracted from the corresponding table for use when automatically managing partitions: + +```sql +--- Find the partition with the corresponding value in the Auto Partition table. +mysql> select * from partitions("catalog"="internal", "database"="optest", "table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.30 sec) + +mysql> select * from information_schema.partitions where TABLE_SCHEMA='optest' and TABLE_NAME='list_table1' and PARTITION_NAME=auto_partition_name('list', null); ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | list_table1 | pX | NULL | 0 | 0 | LIST | NULL | str | NULL | (NULL) | 1 | 1266 | 1266 | 0 | 0 | 0 | 0 | 2024-11-14 19:58:45 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.24 sec) + +--- Find the partition that corresponds to the starting point +mysql> select * from information_schema.partitions where TABLE_NAME='DAILY_TRADE_VALUE' and PARTITION_DESCRIPTION like "[('2012-01-01'),%"; ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | DAILY_TRADE_VALUE | p20120101000000 | NULL | 0 | 0 | RANGE | NULL | TRADE_DATE | NULL | [('2012-01-01'), ('2013-01-01')) | 1 | 985 | 985 | 0 | 0 | 0 | 0 | 2024-11-14 17:29:02 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.65 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/string-functions/auto_partition_name.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/string-functions/auto-partition-name.md similarity index 99% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/string-functions/auto_partition_name.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/string-functions/auto-partition-name.md index b7128c3a29e89..aebe90b6e4d08 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/string-functions/auto_partition_name.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/string-functions/auto-partition-name.md @@ -25,7 +25,7 @@ under the License. --> ## auto_partition_name -### description +### Description #### Syntax `VARCHAR AUTO_PARTITION_NAME('RANGE', 'VARCHAR unit', DATETIME datetime)` @@ -40,9 +40,9 @@ datetime 参数是合法的日期表达式。 unit 参数是您希望的时间间隔,可选的值如下:[`second`,`minute`,`hour`,`day`,`month`,`year`]。 如果 unit 不符合上述可选值,结果将返回语法错误。 -### example -``` +### Example +```sql mysql> select auto_partition_name('range', 'years', '123'); ERROR 1105 (HY000): errCode = 2, detailMessage = range auto_partition_name must accept year|month|day|hour|minute|second for 2nd argument @@ -108,7 +108,6 @@ mysql> select auto_partition_name('list', "你好"); +------------------------------------+ | p4f60597d2 | +------------------------------------+ - ``` ### keywords diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/partitions.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/partitions.md index ce25fc0240cd3..fdab378e5257a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/partitions.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/partitions.md @@ -36,8 +36,6 @@ partitions 该函数用于 From 子句中。 -该函数自 2.1.5 版本开始支持。 - #### Syntax `partitions("catalog"="","database"="","table"="")` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/auto-partitioning.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/auto-partitioning.md index 861c1163215fe..1a34be82f2e3f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/auto-partitioning.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/auto-partitioning.md @@ -72,49 +72,36 @@ PROPERTIES ( ); ``` - - 该表内存储了大量业务历史数据,依据交易发生的日期进行分区。可以看到在建表时,我们需要预先手动创建分区。如果分区列的数据范围发生变化,例如上表中增加了 2022 年的数据,则我们需要通过[ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)对表的分区进行更改。如果这种分区需要变更,或者进行更细粒度的细分,修改起来非常繁琐。此时我们就可以使用 AUTO PARTITION 改写该表 DDL。 ## 语法 -建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时的`partition_info`部分: +建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时的 `partition_info` 部分: 1. AUTO RANGE PARTITION: - ```sql +```sql AUTO PARTITION BY RANGE (FUNC_CALL_EXPR) - ( - ) - ``` - - - - 其中 + () +``` - ```sql +其中 +```sql FUNC_CALL_EXPR ::= date_trunc ( , '' ) - ``` - - - -​ 注意:在 2.1.0 版本,`FUNC_CALL_EXPR` 外围不需要被括号包围。 +``` 2. AUTO LIST PARTITION: ```sql -AUTO PARTITION BY LIST(`partition_col`) -( -) + AUTO PARTITION BY LIST(`partition_col1`[, `partition_col2`, ...]) + () ``` - - ### 用法示例 1. AUTO RANGE PARTITION - ```sql +```sql CREATE TABLE `date_table` ( `TIME_STAMP` datev2 NOT NULL COMMENT '采集日期' ) ENGINE=OLAP @@ -126,13 +113,11 @@ AUTO PARTITION BY LIST(`partition_col`) PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - ``` - - +``` 2. AUTO LIST PARTITION - ```sql +```sql CREATE TABLE `str_table` ( `str` varchar not null ) ENGINE=OLAP @@ -144,7 +129,9 @@ AUTO PARTITION BY LIST(`partition_col`) PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - ``` +``` + + LIST 自动分区支持多个分区列,分区列写法同普通 LIST 分区一样: ```AUTO PARTITION BY LIST (`col1`, `col2`, ...)``` ### 约束 @@ -155,9 +142,9 @@ AUTO PARTITION BY LIST(`partition_col`) ### NULL 值分区 -当开启 session variable `allow_partition_column_nullable` 后,LIST 和 RANGE 分区都支持 NULL 列作为分区列。当分区列实际遇到 NULL 值的插入时: +当开启 session variable `allow_partition_column_nullable` 后: -1. 对于 AUTO LIST PARTITION,会自动创建对应的 NULL 值分区: +1. 对于 AUTO LIST PARTITION,可以使用 NULLABLE 列作为分区列,会正常创建对应的 NULL 值分区: ```sql mysql> create table auto_null_list( @@ -190,8 +177,6 @@ mysql> select * from auto_null_list partition(pX); 1 row in set (0.20 sec) ``` - - 1. 对于 AUTO RANGE PARTITION,**不支持 NULLABLE 列作为分区列**。 ```sql @@ -211,8 +196,6 @@ mysql> CREATE TABLE `range_table_nullable` ( ERROR 1105 (HY000): errCode = 2, detailMessage = AUTO RANGE PARTITION doesn't support NULL column ``` - - ## 场景示例 在使用场景一节中的示例,在使用 AUTO PARTITION 后,该表 DDL 可以改写为: @@ -234,9 +217,7 @@ PROPERTIES ( ); ``` - - -此时新表没有默认分区: +以此表只有两列为例,此时新表没有默认分区: ```sql mysql> show partitions from `DAILY_TRADE_VALUE`; @@ -258,16 +239,59 @@ mysql> show partitions from `DAILY_TRADE_VALUE`; | 180018 | p20140101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2014-01-01]; ..types: [DATEV2]; keys: [2015-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | +-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ 3 rows in set (0.12 sec) - ``` 经过自动分区功能所创建的 PARTITION,与手动创建的 PARTITION 具有完全一致的功能性质。 ## 与动态分区联用 -为使分区逻辑清晰,Doris 禁止自动分区(Auto Partition)和动态分区(Dynamic Partition)同时作用于一张表上,这种用法容易引发误用,应当以单独的自动分区功能代替。 +Doris 支持自动分区和动态分区同时使用。此时,二者的功能都生效: +1. 自动分区将会自动在数据导入过程中按需创建分区; +2. 动态分区将会自动创建、回收、转储分区。 + +二者语法功能不存在冲突,同时设置对应的子句/属性即可。 + +### 最佳实践 -注意:在 Doris 2.1 的某些早期版本中,该功能未被禁止,但不推荐使用。 +需要对分区生命周期设限的场景,可以**将 Dynamic Partition 的创建功能关闭,创建分区完全交由 Auto Partition 完成**,通过 Dynamic Partition 动态回收分区的功能完成分区生命周期的管理: + +```sql +create table auto_dynamic( + k0 datetime(6) NOT NULL +) +auto partition by range (date_trunc(k0, 'year')) +( +) +DISTRIBUTED BY HASH(`k0`) BUCKETS 2 +properties( + "dynamic_partition.enable" = "true", + "dynamic_partition.prefix" = "p", + "dynamic_partition.start" = "-50", + "dynamic_partition.end" = "0", --- Dynamic Partition 不创建分区 + "dynamic_partition.time_unit" = "year", + "replication_num" = "1" +); +``` + +这样我们同时具有了 Auto Partition 的灵活性,且分区名上保持了一致性。 + +## 分区管理 + +当启用自动分区后,分区名可以通过 `auto_partition_name` 函数映射到分区。`partitions` 表函数可以通过分区名产生详细的分区信息。仍然以 `DAILY_TRADE_VALUE` 表为例,在我们插入数据后,查看其当前分区: + +```sql +mysql> select * from partitions("catalog"="internal","database"="optest","table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.18 sec) +``` + +这样每个分区的 ID 和取值就可以精准地被筛选出,用于后续针对分区的具体操作(例如 `insert overwrite partition`)。 + +详细语法说明请见:[auto_partition_name函数文档](../../sql-manual/sql-functions/string-functions/auto-partition-name),[partitions表函数文档](../../sql-manual/sql-functions/table-valued-functions/partitions)。 ## 注意事项 @@ -276,5 +300,9 @@ mysql> show partitions from `DAILY_TRADE_VALUE`; - 使用 AUTO PARTITION 的表,只是分区创建方式上由手动转为了自动。表及其所创建分区的原本使用方法都与非 AUTO PARTITION 的表或分区相同。 - 为防止意外创建过多分区,我们通过[FE 配置项](../../admin-manual/config/fe-config)中的`max_auto_partition_num`控制了一个 AUTO PARTITION 表最大容纳分区数。如有需要可以调整该值 - 向开启了 AUTO PARTITION 的表导入数据时,Coordinator 发送数据的轮询间隔与普通表有所不同。具体请见[BE 配置项](../../admin-manual/config/be-config)中的`olap_table_sink_send_interval_auto_partition_factor`。开启前移(`enable_memtable_on_sink_node = true`)后该变量不产生影响。 -- 在使用[insert-overwrite](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE)插入数据时,如果指定了覆写的 partition,则 AUTO PARTITION 表在此过程中表现得如同普通表,不创建新的分区。 +- 在使用[insert-overwrite](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE)插入数据时 AUTO PARTITION 表的行为详见 INSERT OVERWRITE 文档。 - 如果导入创建分区时,该表涉及其他元数据操作(如 Schema Change、Rebalance),则导入可能失败。 + +## 关键词 + +AUTO, PARTITION, AUTO_PARTITION diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/basic-concepts.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/basic-concepts.md index 871cdfa49f670..952d2ee97ac19 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/basic-concepts.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/basic-concepts.md @@ -24,10 +24,11 @@ specific language governing permissions and limitations under the License. --> +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; 本文档主要介绍 Doris 的建表和数据划分,以及建表操作中可能遇到的问题和解决方法。 - ## Row & Column 在 Doris 中,数据都以表(Table)的形式进行逻辑上的描述。 @@ -115,7 +116,172 @@ PROPERTIES ENGINE 的类型是 OLAP,即默认的 ENGINE 类型。在 Doris 中,只有这个 ENGINE 类型是由 Doris 负责数据管理和存储的。其他 ENGINE 类型,如 MySQL、 Broker、ES 等等,本质上只是对外部其他数据库或系统中的表的映射,以保证 Doris 可以读取这些数据。而 Doris 本身并不创建、管理和存储任何非 OLAP ENGINE 类型的表和数据。 -`IF NOT EXISTS`表示如果没有创建过该表,则创建。注意这里只判断表名是否存在,而不会判断新建表 Schema 是否与已存在的表 Schema 相同。所以如果存在一个同名但不同 Schema 的表,该命令也会返回成功,但并不代表已经创建了新的表和新的 Schema。。 +`IF NOT EXISTS`表示如果没有创建过该表,则创建。注意这里只判断表名是否存在,而不会判断新建表 Schema 是否与已存在的表 Schema 相同。所以如果存在一个同名但不同 Schema 的表,该命令也会返回成功,但并不代表已经创建了新的表和新的 Schema。 + +### 高级特性与示例 + +Doris 支持包括动态分区、自动分区、自动分桶在内的高级数据划分方式,它们能够更灵活地实现数据管理。以下举例实现: + + + +

+ +[自动分区](./auto-partitioning) 支持根据用户定义的规则在数据导入时自动创建对应分区,更为便捷。将上例用自动 Range 分区改写如下: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1" +); +``` + +如上建表,当数据导入时,Doris 将会自动创建对应分区,分区列为 `date`,粒度为月级别。`2018-12-01` 和 `2018-12-31` 将会落入同一个分区,而 `2018-11-12` 将会落入领一个分区。自动分区还支持 List 分区,更多用法请查看自动分区文档。 + +

+
+ + +

+ +[动态分区](./dynamic-partitioning)是根据现实时间进行自动的分区创建与回收的管理方式,将上例用动态分区改写如下: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "WEEK", --- 分区粒度为周 + "dynamic_partition.start" = "-2", --- 向前保留两周 + "dynamic_partition.end" = "2", --- 提前创建后两周 + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +动态分区支持分层存储、自定副本数等功能,详见动态分区文档。 + +

+
+ + +

+ +自动分区与动态分区各有其优点,将二者结合可以实现分区的灵活按需创建和自动回收: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "month", --- 二者粒度必须相同 + "dynamic_partition.start" = "-2" --- 动态分区自动清理超过两周的历史分区 + "dynamic_partition.end" = "0", --- 动态分区不创建未来分区,完全交给自动分区 + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +关于该功能的细节建议,详见[自动分区与动态分区联用](./auto-partitioning#与动态分区联用)。 + +

+
+ + +

+ +当用户不确定合理的分桶数时,可以使用[自动分桶](./auto-bucket)由 Doris 完成估计,用户仅需提供估计的表数据量: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +( + PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), + PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), + PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), + PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) +) +DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO +PROPERTIES +( + "replication_num" = "1", + "estimate_partition_size" = "2G" --- 用户估计一个分区将有的数据量,不提供则默认为 10G +); +``` + +需要注意的是,该方式不适用于表数据量特别大的情况。 + +

+
+ +
## 查看分区信息 @@ -185,3 +351,35 @@ ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-0 ``` 其它更多分区修改操作,参见 SQL 手册 [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)。 + +## 分区检索 + +`partitions` 表函数和 `information_schema.partitions` 系统表记录了集群的分区信息。在自动管理分区时,可以通过对应表提取分区信息使用: + +```sql +--- 在 Auto Partition 表中找对应值所属的分区 +mysql> select * from partitions("catalog"="internal", "database"="optest", "table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.30 sec) + +mysql> select * from information_schema.partitions where TABLE_SCHEMA='optest' and TABLE_NAME='list_table1' and PARTITION_NAME=auto_partition_name('list', null); ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | list_table1 | pX | NULL | 0 | 0 | LIST | NULL | str | NULL | (NULL) | 1 | 1266 | 1266 | 0 | 0 | 0 | 0 | 2024-11-14 19:58:45 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.24 sec) + +--- 找对应起始点的分区 +mysql> select * from information_schema.partitions where TABLE_NAME='DAILY_TRADE_VALUE' and PARTITION_DESCRIPTION like "[('2012-01-01'),%"; ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | DAILY_TRADE_VALUE | p20120101000000 | NULL | 0 | 0 | RANGE | NULL | TRADE_DATE | NULL | [('2012-01-01'), ('2013-01-01')) | 1 | 985 | 985 | 0 | 0 | 0 | 0 | 2024-11-14 17:29:02 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.65 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/table-valued-functions/partitions.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/table-valued-functions/partitions.md deleted file mode 100644 index ce25fc0240cd3..0000000000000 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/table-valued-functions/partitions.md +++ /dev/null @@ -1,130 +0,0 @@ ---- -{ - "title": "PARTITIONS", - "language": "zh-CN" -} ---- - - - -## `partitions` - -### Name - -partitions - -### Description - -表函数,生成分区临时表,可以查看某个 TABLE 的分区列表。 - -该函数用于 From 子句中。 - -该函数自 2.1.5 版本开始支持。 - -#### Syntax - -`partitions("catalog"="","database"="","table"="")` - -partitions()表结构: -```sql -mysql> desc function partitions("catalog"="internal","database"="zd","table"="user"); -+--------------------------+---------+------+-------+---------+-------+ -| Field | Type | Null | Key | Default | Extra | -+--------------------------+---------+------+-------+---------+-------+ -| PartitionId | BIGINT | No | false | NULL | NONE | -| PartitionName | TEXT | No | false | NULL | NONE | -| VisibleVersion | BIGINT | No | false | NULL | NONE | -| VisibleVersionTime | TEXT | No | false | NULL | NONE | -| State | TEXT | No | false | NULL | NONE | -| PartitionKey | TEXT | No | false | NULL | NONE | -| Range | TEXT | No | false | NULL | NONE | -| DistributionKey | TEXT | No | false | NULL | NONE | -| Buckets | INT | No | false | NULL | NONE | -| ReplicationNum | INT | No | false | NULL | NONE | -| StorageMedium | TEXT | No | false | NULL | NONE | -| CooldownTime | TEXT | No | false | NULL | NONE | -| RemoteStoragePolicy | TEXT | No | false | NULL | NONE | -| LastConsistencyCheckTime | TEXT | No | false | NULL | NONE | -| DataSize | TEXT | No | false | NULL | NONE | -| IsInMemory | BOOLEAN | No | false | NULL | NONE | -| ReplicaAllocation | TEXT | No | false | NULL | NONE | -| IsMutable | BOOLEAN | No | false | NULL | NONE | -| SyncWithBaseTables | BOOLEAN | No | false | NULL | NONE | -| UnsyncTables | TEXT | No | false | NULL | NONE | -+--------------------------+---------+------+-------+---------+-------+ -20 rows in set (0.02 sec) -``` - -* PartitionId:分区id -* PartitionName:分区名字 -* VisibleVersion:分区版本 -* VisibleVersionTime:分区版本提交时间 -* State:分区状态 -* PartitionKey:分区key -* Range:分区范围 -* DistributionKey:分布key -* Buckets:分桶数量 -* ReplicationNum:副本数 -* StorageMedium:存储介质 -* CooldownTime:cooldown时间 -* RemoteStoragePolicy:远程存储策略 -* LastConsistencyCheckTime:上次一致性检查时间 -* DataSize:数据大小 -* IsInMemory:是否存在内存 -* ReplicaAllocation:分布策略 -* IsMutable:是否可变 -* SyncWithBaseTables:是否和基表数据同步(针对异步物化视图的分区) -* UnsyncTables:和哪个基表数据不同步(针对异步物化视图的分区) - -```sql -mysql> desc function partitions("catalog"="hive","database"="zdtest","table"="com2"); -+-----------+------+------+-------+---------+-------+ -| Field | Type | Null | Key | Default | Extra | -+-----------+------+------+-------+---------+-------+ -| Partition | TEXT | No | false | NULL | NONE | -+-----------+------+------+-------+---------+-------+ -1 row in set (0.11 sec) -``` - -* Partition:分区名字 - -### Example - -1. 查看 internal CATALOG 下 db1 的 table1 的分区列表 - -```sql -mysql> select * from partitions("catalog"="internal","database"="db1","table"="table1"); -``` - -2. 查看 table1 下的分区名称为 partition1 的分区信息 - -```sql -mysql> select * from partitions("catalog"="internal","database"="db1","table"="table1") where PartitionName = "partition1"; -``` - -3. 查看 table1 下的分区名称为 partition1 的分区 id - -```sql -mysql> select PartitionId from partitions("catalog"="internal","database"="db1","table"="table1") where PartitionName = "partition1"; -``` - -### Keywords - - partitions diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/string-functions/auto-partition-name.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/string-functions/auto-partition-name.md index e84a60b5489ab..0e20924dc7167 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/string-functions/auto-partition-name.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/string-functions/auto-partition-name.md @@ -27,19 +27,12 @@ under the License. ## auto_partition_name :::tip 提示 -该功能自 Apache Doris 2.1.6 版本起支持 +该功能自 Apache Doris 2.1.6 版本起支持 ::: auto_partition_name - - -### description - -:::info 备注 -自 2.1.6 开始支持 auto_partition_name 用法 -::: - +### Description #### Syntax `VARCHAR AUTO_PARTITION_NAME('RANGE', 'VARCHAR unit', DATETIME datetime)` @@ -54,9 +47,10 @@ datetime 参数是合法的日期表达式。 unit 参数是您希望的时间间隔,可选的值如下:[`second`,`minute`,`hour`,`day`,`month`,`year`]。 如果 unit 不符合上述可选值,结果将返回语法错误。 -### example -``` +### Example + +```sql mysql> select auto_partition_name('range', 'years', '123'); ERROR 1105 (HY000): errCode = 2, detailMessage = range auto_partition_name must accept year|month|day|hour|minute|second for 2nd argument @@ -122,7 +116,6 @@ mysql> select auto_partition_name('list', "你好"); +------------------------------------+ | p4f60597d2 | +------------------------------------+ - ``` ### keywords diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/table-valued-functions/partitions.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/table-valued-functions/partitions.md index ce25fc0240cd3..3c85703263417 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/table-valued-functions/partitions.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/table-valued-functions/partitions.md @@ -36,7 +36,7 @@ partitions 该函数用于 From 子句中。 -该函数自 2.1.5 版本开始支持。 +**该函数自 2.1.5 版本开始支持。** #### Syntax diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-partitioning/auto-partitioning.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-partitioning/auto-partitioning.md index 4119b70dcdaa7..52a05eef9a8c5 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-partitioning/auto-partitioning.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-partitioning/auto-partitioning.md @@ -72,49 +72,36 @@ PROPERTIES ( ); ``` - - 该表内存储了大量业务历史数据,依据交易发生的日期进行分区。可以看到在建表时,我们需要预先手动创建分区。如果分区列的数据范围发生变化,例如上表中增加了 2022 年的数据,则我们需要通过[ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)对表的分区进行更改。如果这种分区需要变更,或者进行更细粒度的细分,修改起来非常繁琐。此时我们就可以使用 AUTO PARTITION 改写该表 DDL。 ## 语法 -建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时的`partition_info`部分: +建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时的 `partition_info` 部分: 1. AUTO RANGE PARTITION: - ```sql +```sql AUTO PARTITION BY RANGE (FUNC_CALL_EXPR) - ( - ) - ``` - - - - 其中 + () +``` - ```sql +其中 +```sql FUNC_CALL_EXPR ::= date_trunc ( , '' ) - ``` - - - -​ 注意:在 2.1.0 版本,`FUNC_CALL_EXPR` 外围不需要被括号包围。 +``` 2. AUTO LIST PARTITION: ```sql -AUTO PARTITION BY LIST(`partition_col`) -( -) + AUTO PARTITION BY LIST(`partition_col1`[, `partition_col2`, ...]) + () ``` - - ### 用法示例 1. AUTO RANGE PARTITION - ```sql +```sql CREATE TABLE `date_table` ( `TIME_STAMP` datev2 NOT NULL COMMENT '采集日期' ) ENGINE=OLAP @@ -126,13 +113,11 @@ AUTO PARTITION BY LIST(`partition_col`) PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - ``` - - +``` 2. AUTO LIST PARTITION - ```sql +```sql CREATE TABLE `str_table` ( `str` varchar not null ) ENGINE=OLAP @@ -144,7 +129,9 @@ AUTO PARTITION BY LIST(`partition_col`) PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - ``` +``` + + LIST 自动分区支持多个分区列,分区列写法同普通 LIST 分区一样: ```AUTO PARTITION BY LIST (`col1`, `col2`, ...)``` ### 约束 @@ -155,9 +142,9 @@ AUTO PARTITION BY LIST(`partition_col`) ### NULL 值分区 -当开启 session variable `allow_partition_column_nullable` 后,LIST 和 RANGE 分区都支持 NULL 列作为分区列。当分区列实际遇到 NULL 值的插入时: +当开启 session variable `allow_partition_column_nullable` 后: -1. 对于 AUTO LIST PARTITION,会自动创建对应的 NULL 值分区: +1. 对于 AUTO LIST PARTITION,可以使用 NULLABLE 列作为分区列,会正常创建对应的 NULL 值分区: ```sql mysql> create table auto_null_list( @@ -190,8 +177,6 @@ mysql> select * from auto_null_list partition(pX); 1 row in set (0.20 sec) ``` - - 1. 对于 AUTO RANGE PARTITION,**不支持 NULLABLE 列作为分区列**。 ```sql @@ -211,9 +196,7 @@ mysql> CREATE TABLE `range_table_nullable` ( ERROR 1105 (HY000): errCode = 2, detailMessage = AUTO RANGE PARTITION doesn't support NULL column ``` - - -### 场景示例 +## 场景示例 在使用场景一节中的示例,在使用 AUTO PARTITION 后,该表 DDL 可以改写为: @@ -234,9 +217,7 @@ PROPERTIES ( ); ``` - - -此时新表没有默认分区: +以此表只有两列为例,此时新表没有默认分区: ```sql mysql> show partitions from `DAILY_TRADE_VALUE`; @@ -258,16 +239,67 @@ mysql> show partitions from `DAILY_TRADE_VALUE`; | 180018 | p20140101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2014-01-01]; ..types: [DATEV2]; keys: [2015-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | +-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ 3 rows in set (0.12 sec) - ``` 经过自动分区功能所创建的 PARTITION,与手动创建的 PARTITION 具有完全一致的功能性质。 ## 与动态分区联用 -为使分区逻辑清晰,Doris 禁止自动分区(Auto Partition)和动态分区(Dynamic Partition)同时作用于一张表上,这种用法容易引发误用,应当以单独的自动分区功能代替。 +自 2.1.7 起,Doris 支持自动分区和动态分区同时使用。此时,二者的功能都生效: +1. 自动分区将会自动在数据导入过程中按需创建分区; +2. 动态分区将会自动创建、回收、转储分区。 + +二者语法功能不存在冲突,同时设置对应的子句/属性即可。 + +### 最佳实践 + +需要对分区生命周期设限的场景,可以**将 Dynamic Partition 的创建功能关闭,创建分区完全交由 Auto Partition 完成**,通过 Dynamic Partition 动态回收分区的功能完成分区生命周期的管理: + +```sql +create table auto_dynamic( + k0 datetime(6) NOT NULL +) +auto partition by range (date_trunc(k0, 'year')) +( +) +DISTRIBUTED BY HASH(`k0`) BUCKETS 2 +properties( + "dynamic_partition.enable" = "true", + "dynamic_partition.prefix" = "p", + "dynamic_partition.start" = "-50", + "dynamic_partition.end" = "0", --- Dynamic Partition 不创建分区 + "dynamic_partition.time_unit" = "year", + "replication_num" = "1" +); +``` + +这样我们同时具有了 Auto Partition 的灵活性,且分区名上保持了一致性。 + +:::note +在 2.1.7 之前的某些早期版本,该功能未禁止但不建议使用。 +::: + +## 分区管理 + +:::tip +自 2.1.6 起,Doris 支持 `partitions` 表函数和 `auto_partition_name` 函数,用于方便地对数据找到对应分区,并进行管理。 +::: + +当启用自动分区后,分区名可以通过 `auto_partition_name` 函数映射到分区。`partitions` 表函数可以通过分区名产生详细的分区信息。仍然以 `DAILY_TRADE_VALUE` 表为例,在我们插入数据后,查看其当前分区: -注意:在 Doris 2.1 的某些早期版本中,该功能未被禁止,但不推荐使用。 +```sql +mysql> select * from partitions("catalog"="internal","database"="optest","table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.18 sec) +``` + +这样每个分区的 ID 和取值就可以精准地被筛选出,用于后续针对分区的具体操作(例如 `insert overwrite partition`)。 + +详细语法说明请见:[auto_partition_name函数文档](../../sql-manual/sql-functions/string-functions/auto-partition-name),[partitions表函数文档](../../sql-manual/sql-functions/table-valued-functions/partitions)。 ## 注意事项 @@ -276,5 +308,9 @@ mysql> show partitions from `DAILY_TRADE_VALUE`; - 使用 AUTO PARTITION 的表,只是分区创建方式上由手动转为了自动。表及其所创建分区的原本使用方法都与非 AUTO PARTITION 的表或分区相同。 - 为防止意外创建过多分区,我们通过[FE 配置项](../../admin-manual/config/fe-config)中的`max_auto_partition_num`控制了一个 AUTO PARTITION 表最大容纳分区数。如有需要可以调整该值 - 向开启了 AUTO PARTITION 的表导入数据时,Coordinator 发送数据的轮询间隔与普通表有所不同。具体请见[BE 配置项](../../admin-manual/config/be-config)中的`olap_table_sink_send_interval_auto_partition_factor`。开启前移(`enable_memtable_on_sink_node = true`)后该变量不产生影响。 -- 在使用[insert-overwrite](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE)插入数据时,如果指定了覆写的 partition,则 AUTO PARTITION 表在此过程中表现得如同普通表,不创建新的分区。 +- 在使用[insert-overwrite](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE)插入数据时 AUTO PARTITION 表的行为详见 INSERT OVERWRITE 文档。 - 如果导入创建分区时,该表涉及其他元数据操作(如 Schema Change、Rebalance),则导入可能失败。 + +## 关键词 + +AUTO, PARTITION, AUTO_PARTITION diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-partitioning/basic-concepts.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-partitioning/basic-concepts.md index 6b16126f31b61..588b3acb56fd1 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-partitioning/basic-concepts.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-partitioning/basic-concepts.md @@ -24,10 +24,11 @@ specific language governing permissions and limitations under the License. --> +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; 本文档主要介绍 Doris 的建表和数据划分,以及建表操作中可能遇到的问题和解决方法。 - ## Row & Column 在 Doris 中,数据都以表(Table)的形式进行逻辑上的描述。 @@ -115,7 +116,176 @@ PROPERTIES ENGINE 的类型是 OLAP,即默认的 ENGINE 类型。在 Doris 中,只有这个 ENGINE 类型是由 Doris 负责数据管理和存储的。其他 ENGINE 类型,如 MySQL、 Broker、ES 等等,本质上只是对外部其他数据库或系统中的表的映射,以保证 Doris 可以读取这些数据。而 Doris 本身并不创建、管理和存储任何非 OLAP ENGINE 类型的表和数据。 -`IF NOT EXISTS`表示如果没有创建过该表,则创建。注意这里只判断表名是否存在,而不会判断新建表 Schema 是否与已存在的表 Schema 相同。所以如果存在一个同名但不同 Schema 的表,该命令也会返回成功,但并不代表已经创建了新的表和新的 Schema。。 +`IF NOT EXISTS`表示如果没有创建过该表,则创建。注意这里只判断表名是否存在,而不会判断新建表 Schema 是否与已存在的表 Schema 相同。所以如果存在一个同名但不同 Schema 的表,该命令也会返回成功,但并不代表已经创建了新的表和新的 Schema。 + +### 高级特性与示例 + +Doris 支持包括动态分区、自动分区、自动分桶在内的高级数据划分方式,它们能够更灵活地实现数据管理。以下举例实现: + + + +

+ +[自动分区](./auto-partitioning) 支持根据用户定义的规则在数据导入时自动创建对应分区,更为便捷。将上例用自动 Range 分区改写如下: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1" +); +``` + +如上建表,当数据导入时,Doris 将会自动创建对应分区,分区列为 `date`,粒度为月级别。`2018-12-01` 和 `2018-12-31` 将会落入同一个分区,而 `2018-11-12` 将会落入领一个分区。自动分区还支持 List 分区,更多用法请查看自动分区文档。 + +

+
+ + +

+ +[动态分区](./dynamic-partitioning)是根据现实时间进行自动的分区创建与回收的管理方式,将上例用动态分区改写如下: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "WEEK", --- 分区粒度为周 + "dynamic_partition.start" = "-2", --- 向前保留两周 + "dynamic_partition.end" = "2", --- 提前创建后两周 + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +动态分区支持分层存储、自定副本数等功能,详见动态分区文档。 + +

+
+ + +

+ +:::tip +该功能自 2.1.7 支持 +::: + +自动分区与动态分区各有其优点,将二者结合可以实现分区的灵活按需创建和自动回收: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "month", --- 二者粒度必须相同 + "dynamic_partition.start" = "-2" --- 动态分区自动清理超过两周的历史分区 + "dynamic_partition.end" = "0", --- 动态分区不创建未来分区,完全交给自动分区 + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +关于该功能的细节建议,详见[自动分区与动态分区联用](./auto-partitioning#与动态分区联用)。 + +

+
+ + +

+ +当用户不确定合理的分桶数时,可以使用[自动分桶](./auto-bucket)由 Doris 完成估计,用户仅需提供估计的表数据量: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +( + PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), + PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), + PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), + PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) +) +DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO +PROPERTIES +( + "replication_num" = "1", + "estimate_partition_size" = "2G" --- 用户估计一个分区将有的数据量,不提供则默认为 10G +); +``` + +需要注意的是,该方式不适用于表数据量特别大的情况。 + +

+
+ +
## 查看分区信息 @@ -184,4 +354,36 @@ ENGINE 的类型是 OLAP,即默认的 ENGINE 类型。在 Doris 中,只有 ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; ``` -其它更多分区修改操作,参见 SQL 手册 [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)。 \ No newline at end of file +其它更多分区修改操作,参见 SQL 手册 [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)。 + +## 分区检索 + +`partitions` 表函数(自 2.1.5 支持)和 `information_schema.partitions` 系统表(自 2.1.7 支持)记录了集群的分区信息。在自动管理分区时,可以通过对应表提取分区信息使用: + +```sql +--- 在 Auto Partition 表中找对应值所属的分区 +mysql> select * from partitions("catalog"="internal", "database"="optest", "table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.30 sec) + +mysql> select * from information_schema.partitions where TABLE_SCHEMA='optest' and TABLE_NAME='list_table1' and PARTITION_NAME=auto_partition_name('list', null); ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | list_table1 | pX | NULL | 0 | 0 | LIST | NULL | str | NULL | (NULL) | 1 | 1266 | 1266 | 0 | 0 | 0 | 0 | 2024-11-14 19:58:45 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.24 sec) + +--- 找对应起始点的分区 +mysql> select * from information_schema.partitions where TABLE_NAME='DAILY_TRADE_VALUE' and PARTITION_DESCRIPTION like "[('2012-01-01'),%"; ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | DAILY_TRADE_VALUE | p20120101000000 | NULL | 0 | 0 | RANGE | NULL | TRADE_DATE | NULL | [('2012-01-01'), ('2013-01-01')) | 1 | 985 | 985 | 0 | 0 | 0 | 0 | 2024-11-14 17:29:02 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.65 sec) +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/string-functions/auto_partition_name.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/string-functions/auto-partition-name.md similarity index 96% rename from i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/string-functions/auto_partition_name.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/string-functions/auto-partition-name.md index 54191eebbd372..c6c4ad7da3dde 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/string-functions/auto_partition_name.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/string-functions/auto-partition-name.md @@ -27,19 +27,12 @@ under the License. ## auto_partition_name :::tip 提示 -该功能自 Apache Doris 3.0.1 版本起支持 +该功能自 Apache Doris 3.0.2 版本起支持 ::: auto_partition_name - - -### description - -:::info 备注 -自 3.0.1 开始支持 auto_partition_name 用法 -::: - +### Description #### Syntax `VARCHAR AUTO_PARTITION_NAME('RANGE', 'VARCHAR unit', DATETIME datetime)` @@ -54,9 +47,10 @@ datetime 参数是合法的日期表达式。 unit 参数是您希望的时间间隔,可选的值如下:[`second`,`minute`,`hour`,`day`,`month`,`year`]。 如果 unit 不符合上述可选值,结果将返回语法错误。 -### example -``` +### Example + +```sql mysql> select auto_partition_name('range', 'years', '123'); ERROR 1105 (HY000): errCode = 2, detailMessage = range auto_partition_name must accept year|month|day|hour|minute|second for 2nd argument @@ -122,7 +116,6 @@ mysql> select auto_partition_name('list', "你好"); +------------------------------------+ | p4f60597d2 | +------------------------------------+ - ``` ### keywords diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/table-valued-functions/partitions.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/table-valued-functions/partitions.md index ce25fc0240cd3..fdab378e5257a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/table-valued-functions/partitions.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/table-valued-functions/partitions.md @@ -36,8 +36,6 @@ partitions 该函数用于 From 子句中。 -该函数自 2.1.5 版本开始支持。 - #### Syntax `partitions("catalog"="","database"="","table"="")` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-partitioning/auto-partitioning.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-partitioning/auto-partitioning.md index 861c1163215fe..347c156d185d7 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-partitioning/auto-partitioning.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-partitioning/auto-partitioning.md @@ -72,49 +72,36 @@ PROPERTIES ( ); ``` - - 该表内存储了大量业务历史数据,依据交易发生的日期进行分区。可以看到在建表时,我们需要预先手动创建分区。如果分区列的数据范围发生变化,例如上表中增加了 2022 年的数据,则我们需要通过[ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)对表的分区进行更改。如果这种分区需要变更,或者进行更细粒度的细分,修改起来非常繁琐。此时我们就可以使用 AUTO PARTITION 改写该表 DDL。 ## 语法 -建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时的`partition_info`部分: +建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)时的 `partition_info` 部分: 1. AUTO RANGE PARTITION: - ```sql +```sql AUTO PARTITION BY RANGE (FUNC_CALL_EXPR) - ( - ) - ``` - - - - 其中 + () +``` - ```sql +其中 +```sql FUNC_CALL_EXPR ::= date_trunc ( , '' ) - ``` - - - -​ 注意:在 2.1.0 版本,`FUNC_CALL_EXPR` 外围不需要被括号包围。 +``` 2. AUTO LIST PARTITION: ```sql -AUTO PARTITION BY LIST(`partition_col`) -( -) + AUTO PARTITION BY LIST(`partition_col1`[, `partition_col2`, ...]) + () ``` - - ### 用法示例 1. AUTO RANGE PARTITION - ```sql +```sql CREATE TABLE `date_table` ( `TIME_STAMP` datev2 NOT NULL COMMENT '采集日期' ) ENGINE=OLAP @@ -126,13 +113,11 @@ AUTO PARTITION BY LIST(`partition_col`) PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - ``` - - +``` 2. AUTO LIST PARTITION - ```sql +```sql CREATE TABLE `str_table` ( `str` varchar not null ) ENGINE=OLAP @@ -144,7 +129,9 @@ AUTO PARTITION BY LIST(`partition_col`) PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); - ``` +``` + + LIST 自动分区支持多个分区列,分区列写法同普通 LIST 分区一样: ```AUTO PARTITION BY LIST (`col1`, `col2`, ...)``` ### 约束 @@ -155,9 +142,9 @@ AUTO PARTITION BY LIST(`partition_col`) ### NULL 值分区 -当开启 session variable `allow_partition_column_nullable` 后,LIST 和 RANGE 分区都支持 NULL 列作为分区列。当分区列实际遇到 NULL 值的插入时: +当开启 session variable `allow_partition_column_nullable` 后: -1. 对于 AUTO LIST PARTITION,会自动创建对应的 NULL 值分区: +1. 对于 AUTO LIST PARTITION,可以使用 NULLABLE 列作为分区列,会正常创建对应的 NULL 值分区: ```sql mysql> create table auto_null_list( @@ -190,8 +177,6 @@ mysql> select * from auto_null_list partition(pX); 1 row in set (0.20 sec) ``` - - 1. 对于 AUTO RANGE PARTITION,**不支持 NULLABLE 列作为分区列**。 ```sql @@ -211,8 +196,6 @@ mysql> CREATE TABLE `range_table_nullable` ( ERROR 1105 (HY000): errCode = 2, detailMessage = AUTO RANGE PARTITION doesn't support NULL column ``` - - ## 场景示例 在使用场景一节中的示例,在使用 AUTO PARTITION 后,该表 DDL 可以改写为: @@ -234,9 +217,7 @@ PROPERTIES ( ); ``` - - -此时新表没有默认分区: +以此表只有两列为例,此时新表没有默认分区: ```sql mysql> show partitions from `DAILY_TRADE_VALUE`; @@ -258,16 +239,59 @@ mysql> show partitions from `DAILY_TRADE_VALUE`; | 180018 | p20140101000000 | 2 | 2023-09-18 21:49:29 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2014-01-01]; ..types: [DATEV2]; keys: [2015-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | +-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ 3 rows in set (0.12 sec) - ``` 经过自动分区功能所创建的 PARTITION,与手动创建的 PARTITION 具有完全一致的功能性质。 ## 与动态分区联用 -为使分区逻辑清晰,Doris 禁止自动分区(Auto Partition)和动态分区(Dynamic Partition)同时作用于一张表上,这种用法容易引发误用,应当以单独的自动分区功能代替。 +自 3.0.3 起,Doris 支持自动分区和动态分区同时使用。此时,二者的功能都生效: +1. 自动分区将会自动在数据导入过程中按需创建分区; +2. 动态分区将会自动创建、回收、转储分区。 + +二者语法功能不存在冲突,同时设置对应的子句/属性即可。 + +### 最佳实践 -注意:在 Doris 2.1 的某些早期版本中,该功能未被禁止,但不推荐使用。 +需要对分区生命周期设限的场景,可以**将 Dynamic Partition 的创建功能关闭,创建分区完全交由 Auto Partition 完成**,通过 Dynamic Partition 动态回收分区的功能完成分区生命周期的管理: + +```sql +create table auto_dynamic( + k0 datetime(6) NOT NULL +) +auto partition by range (date_trunc(k0, 'year')) +( +) +DISTRIBUTED BY HASH(`k0`) BUCKETS 2 +properties( + "dynamic_partition.enable" = "true", + "dynamic_partition.prefix" = "p", + "dynamic_partition.start" = "-50", + "dynamic_partition.end" = "0", --- Dynamic Partition 不创建分区 + "dynamic_partition.time_unit" = "year", + "replication_num" = "1" +); +``` + +这样我们同时具有了 Auto Partition 的灵活性,且分区名上保持了一致性。 + +## 分区管理 + +当启用自动分区后,分区名可以通过 `auto_partition_name` 函数映射到分区。`partitions` 表函数可以通过分区名产生详细的分区信息。仍然以 `DAILY_TRADE_VALUE` 表为例,在我们插入数据后,查看其当前分区: + +```sql +mysql> select * from partitions("catalog"="internal","database"="optest","table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.18 sec) +``` + +这样每个分区的 ID 和取值就可以精准地被筛选出,用于后续针对分区的具体操作(例如 `insert overwrite partition`)。 + +详细语法说明请见:[auto_partition_name函数文档](../../sql-manual/sql-functions/string-functions/auto-partition-name),[partitions表函数文档](../../sql-manual/sql-functions/table-valued-functions/partitions)。 ## 注意事项 @@ -276,5 +300,9 @@ mysql> show partitions from `DAILY_TRADE_VALUE`; - 使用 AUTO PARTITION 的表,只是分区创建方式上由手动转为了自动。表及其所创建分区的原本使用方法都与非 AUTO PARTITION 的表或分区相同。 - 为防止意外创建过多分区,我们通过[FE 配置项](../../admin-manual/config/fe-config)中的`max_auto_partition_num`控制了一个 AUTO PARTITION 表最大容纳分区数。如有需要可以调整该值 - 向开启了 AUTO PARTITION 的表导入数据时,Coordinator 发送数据的轮询间隔与普通表有所不同。具体请见[BE 配置项](../../admin-manual/config/be-config)中的`olap_table_sink_send_interval_auto_partition_factor`。开启前移(`enable_memtable_on_sink_node = true`)后该变量不产生影响。 -- 在使用[insert-overwrite](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE)插入数据时,如果指定了覆写的 partition,则 AUTO PARTITION 表在此过程中表现得如同普通表,不创建新的分区。 +- 在使用[insert-overwrite](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE)插入数据时 AUTO PARTITION 表的行为详见 INSERT OVERWRITE 文档。 - 如果导入创建分区时,该表涉及其他元数据操作(如 Schema Change、Rebalance),则导入可能失败。 + +## 关键词 + +AUTO, PARTITION, AUTO_PARTITION diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-partitioning/basic-concepts.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-partitioning/basic-concepts.md index c1f5bce7a4130..19fe042561274 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-partitioning/basic-concepts.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-partitioning/basic-concepts.md @@ -24,6 +24,8 @@ specific language governing permissions and limitations under the License. --> +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; 本文档主要介绍 Doris 的建表和数据划分,以及建表操作中可能遇到的问题和解决方法。 @@ -114,7 +116,172 @@ PROPERTIES ENGINE 的类型是 OLAP,即默认的 ENGINE 类型。在 Doris 中,只有这个 ENGINE 类型是由 Doris 负责数据管理和存储的。其他 ENGINE 类型,如 MySQL、 Broker、ES 等等,本质上只是对外部其他数据库或系统中的表的映射,以保证 Doris 可以读取这些数据。而 Doris 本身并不创建、管理和存储任何非 OLAP ENGINE 类型的表和数据。 -`IF NOT EXISTS`表示如果没有创建过该表,则创建。注意这里只判断表名是否存在,而不会判断新建表 Schema 是否与已存在的表 Schema 相同。所以如果存在一个同名但不同 Schema 的表,该命令也会返回成功,但并不代表已经创建了新的表和新的 Schema。。 +`IF NOT EXISTS`表示如果没有创建过该表,则创建。注意这里只判断表名是否存在,而不会判断新建表 Schema 是否与已存在的表 Schema 相同。所以如果存在一个同名但不同 Schema 的表,该命令也会返回成功,但并不代表已经创建了新的表和新的 Schema。 + +### 高级特性与示例 + +Doris 支持包括动态分区、自动分区、自动分桶在内的高级数据划分方式,它们能够更灵活地实现数据管理。以下举例实现: + + + +

+ +[自动分区](./auto-partitioning) 支持根据用户定义的规则在数据导入时自动创建对应分区,更为便捷。将上例用自动 Range 分区改写如下: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1" +); +``` + +如上建表,当数据导入时,Doris 将会自动创建对应分区,分区列为 `date`,粒度为月级别。`2018-12-01` 和 `2018-12-31` 将会落入同一个分区,而 `2018-11-12` 将会落入领一个分区。自动分区还支持 List 分区,更多用法请查看自动分区文档。 + +

+
+ + +

+ +[动态分区](./dynamic-partitioning)是根据现实时间进行自动的分区创建与回收的管理方式,将上例用动态分区改写如下: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "WEEK", --- 分区粒度为周 + "dynamic_partition.start" = "-2", --- 向前保留两周 + "dynamic_partition.end" = "2", --- 提前创建后两周 + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +动态分区支持分层存储、自定副本数等功能,详见动态分区文档。 + +

+
+ + +

+ +自动分区与动态分区各有其优点,将二者结合可以实现分区的灵活按需创建和自动回收: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- 使用月作为分区粒度 +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "month", --- 二者粒度必须相同 + "dynamic_partition.start" = "-2" --- 动态分区自动清理超过两周的历史分区 + "dynamic_partition.end" = "0", --- 动态分区不创建未来分区,完全交给自动分区 + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +关于该功能的细节建议,详见[自动分区与动态分区联用](./auto-partitioning#与动态分区联用)。 + +

+
+ + +

+ +当用户不确定合理的分桶数时,可以使用[自动分桶](./auto-bucket)由 Doris 完成估计,用户仅需提供估计的表数据量: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "用户id", + `date` DATE NOT NULL COMMENT "数据灌入日期时间", + `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳", + `city` VARCHAR(20) COMMENT "用户所在城市", + `age` SMALLINT COMMENT "用户年龄", + `sex` TINYINT COMMENT "用户性别", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "用户最后一次访问时间", + `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +( + PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), + PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), + PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), + PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) +) +DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO +PROPERTIES +( + "replication_num" = "1", + "estimate_partition_size" = "2G" --- 用户估计一个分区将有的数据量,不提供则默认为 10G +); +``` + +需要注意的是,该方式不适用于表数据量特别大的情况。 + +

+
+ +
## 查看分区信息 @@ -183,4 +350,36 @@ ENGINE 的类型是 OLAP,即默认的 ENGINE 类型。在 Doris 中,只有 ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; ``` -其它更多分区修改操作,参见 SQL 手册 [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)。 \ No newline at end of file +其它更多分区修改操作,参见 SQL 手册 [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)。 + +## 分区检索 + +`partitions` 表函数和 `information_schema.partitions` 系统表(自 3.0.2 起支持)记录了集群的分区信息。在自动管理分区时,可以通过对应表提取分区信息使用: + +```sql +--- 在 Auto Partition 表中找对应值所属的分区 +mysql> select * from partitions("catalog"="internal", "database"="optest", "table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.30 sec) + +mysql> select * from information_schema.partitions where TABLE_SCHEMA='optest' and TABLE_NAME='list_table1' and PARTITION_NAME=auto_partition_name('list', null); ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | list_table1 | pX | NULL | 0 | 0 | LIST | NULL | str | NULL | (NULL) | 1 | 1266 | 1266 | 0 | 0 | 0 | 0 | 2024-11-14 19:58:45 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.24 sec) + +--- 找对应起始点的分区 +mysql> select * from information_schema.partitions where TABLE_NAME='DAILY_TRADE_VALUE' and PARTITION_DESCRIPTION like "[('2012-01-01'),%"; ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | DAILY_TRADE_VALUE | p20120101000000 | NULL | 0 | 0 | RANGE | NULL | TRADE_DATE | NULL | [('2012-01-01'), ('2013-01-01')) | 1 | 985 | 985 | 0 | 0 | 0 | 0 | 2024-11-14 17:29:02 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.65 sec) +``` diff --git a/sidebars.json b/sidebars.json index e4e37838164bc..87a534cf8d330 100644 --- a/sidebars.json +++ b/sidebars.json @@ -914,7 +914,7 @@ "sql-manual/sql-functions/string-functions/to-base64", "sql-manual/sql-functions/string-functions/from-base64", "sql-manual/sql-functions/string-functions/ascii", - "sql-manual/sql-functions/string-functions/auto_partition_name", + "sql-manual/sql-functions/string-functions/auto-partition-name", "sql-manual/sql-functions/string-functions/crc32", "sql-manual/sql-functions/string-functions/length", "sql-manual/sql-functions/string-functions/bit-length", diff --git a/versioned_docs/version-2.0/sql-manual/sql-functions/table-valued-functions/partitions.md b/versioned_docs/version-2.0/sql-manual/sql-functions/table-valued-functions/partitions.md deleted file mode 100644 index 7bda80d77e298..0000000000000 --- a/versioned_docs/version-2.0/sql-manual/sql-functions/table-valued-functions/partitions.md +++ /dev/null @@ -1,130 +0,0 @@ ---- -{ - "title": "PARTITIONS", - "language": "en" -} ---- - - - -## `partitions` - -### Name - -partitions - -### Description - -The table function generates a temporary partition TABLE, which allows you to view the PARTITION list of a certain TABLE. - -This function is used in the from clause. - -This function is supported since 2.1.5 - -#### Syntax - -`partitions("catalog"="","database"="","table"="")` - -partitions() Table structure: -```sql -mysql> desc function partitions("catalog"="internal","database"="zd","table"="user"); -+--------------------------+---------+------+-------+---------+-------+ -| Field | Type | Null | Key | Default | Extra | -+--------------------------+---------+------+-------+---------+-------+ -| PartitionId | BIGINT | No | false | NULL | NONE | -| PartitionName | TEXT | No | false | NULL | NONE | -| VisibleVersion | BIGINT | No | false | NULL | NONE | -| VisibleVersionTime | TEXT | No | false | NULL | NONE | -| State | TEXT | No | false | NULL | NONE | -| PartitionKey | TEXT | No | false | NULL | NONE | -| Range | TEXT | No | false | NULL | NONE | -| DistributionKey | TEXT | No | false | NULL | NONE | -| Buckets | INT | No | false | NULL | NONE | -| ReplicationNum | INT | No | false | NULL | NONE | -| StorageMedium | TEXT | No | false | NULL | NONE | -| CooldownTime | TEXT | No | false | NULL | NONE | -| RemoteStoragePolicy | TEXT | No | false | NULL | NONE | -| LastConsistencyCheckTime | TEXT | No | false | NULL | NONE | -| DataSize | TEXT | No | false | NULL | NONE | -| IsInMemory | BOOLEAN | No | false | NULL | NONE | -| ReplicaAllocation | TEXT | No | false | NULL | NONE | -| IsMutable | BOOLEAN | No | false | NULL | NONE | -| SyncWithBaseTables | BOOLEAN | No | false | NULL | NONE | -| UnsyncTables | TEXT | No | false | NULL | NONE | -+--------------------------+---------+------+-------+---------+-------+ -20 rows in set (0.02 sec) -``` - -* PartitionId:partition id -* PartitionName:partition name -* VisibleVersion:visible version -* VisibleVersionTime:visible version time -* State:state -* PartitionKey:partition key -* Range:range -* DistributionKey:distribution key -* Buckets:bucket num -* ReplicationNum:replication num -* StorageMedium:storage medium -* CooldownTime:cooldown time -* RemoteStoragePolicy:remote storage policy -* LastConsistencyCheckTime:last consistency check time -* DataSize:data size -* IsInMemory:is in memory -* ReplicaAllocation:replica allocation -* IsMutable:is mutable -* SyncWithBaseTables:Is it synchronized with the base table data (for partitioning asynchronous materialized views) -* UnsyncTables:Which base table data is not synchronized with (for partitions of asynchronous materialized views) - -```sql -mysql> desc function partitions("catalog"="hive","database"="zdtest","table"="com2"); -+-----------+------+------+-------+---------+-------+ -| Field | Type | Null | Key | Default | Extra | -+-----------+------+------+-------+---------+-------+ -| Partition | TEXT | No | false | NULL | NONE | -+-----------+------+------+-------+---------+-------+ -1 row in set (0.11 sec) -``` - -* Partition:partition name - -### Example - -1. View the partition list of table1 under db1 in the internal catalog - -```sql -mysql> select * from partitions("catalog"="internal","database"="db1","table"="table1"); -``` - -2. View the partition information with partition name partition1 under table1 - -```sql -mysql> select * from partitions("catalog"="internal","database"="db1","table"="table1") where PartitionName = "partition1"; -``` - -3. View the partition ID with the partition name 'partition1' under Table 1 - -```sql -mysql> select PartitionId from partitions("catalog"="internal","database"="db1","table"="table1") where PartitionName = "partition1"; -``` - -### Keywords - - partitions diff --git a/versioned_docs/version-2.1/sql-manual/sql-functions/string-functions/auto-partition-name.md b/versioned_docs/version-2.1/sql-manual/sql-functions/string-functions/auto-partition-name.md index c9b1dd0421dc5..e30d43fce4a70 100644 --- a/versioned_docs/version-2.1/sql-manual/sql-functions/string-functions/auto-partition-name.md +++ b/versioned_docs/version-2.1/sql-manual/sql-functions/string-functions/auto-partition-name.md @@ -25,7 +25,7 @@ under the License. --> ## auto_partition_name -### description +### Description #### Syntax `VARCHAR AUTO_PARTITION_NAME('RANGE', 'VARCHAR unit', DATETIME datetime)` @@ -40,9 +40,12 @@ The datetime parameter is a legal date expression. The unit parameter is the time interval you want, the available values are: [`second`, `minute`, `hour`, `day`, `month`, `year`]. If unit does not match one of these options, a syntax error will be returned. -### example -``` +**Supported since Doris 2.1.6** + +### Example + +```sql mysql> select auto_partition_name('range', 'years', '123'); ERROR 1105 (HY000): errCode = 2, detailMessage = range auto_partition_name must accept year|month|day|hour|minute|second for 2nd argument @@ -108,7 +111,6 @@ mysql> select auto_partition_name('list', "你好"); +------------------------------------+ | p4f60597d2 | +------------------------------------+ - ``` ### keywords diff --git a/versioned_docs/version-2.1/sql-manual/sql-functions/table-valued-functions/partitions.md b/versioned_docs/version-2.1/sql-manual/sql-functions/table-valued-functions/partitions.md index 7bda80d77e298..21ab97e908d07 100644 --- a/versioned_docs/version-2.1/sql-manual/sql-functions/table-valued-functions/partitions.md +++ b/versioned_docs/version-2.1/sql-manual/sql-functions/table-valued-functions/partitions.md @@ -36,7 +36,7 @@ The table function generates a temporary partition TABLE, which allows you to vi This function is used in the from clause. -This function is supported since 2.1.5 +**This function is supported since 2.1.5** #### Syntax diff --git a/versioned_docs/version-2.1/table-design/data-partitioning/auto-partitioning.md b/versioned_docs/version-2.1/table-design/data-partitioning/auto-partitioning.md index 7146aac17cbc9..4c358a7157cf8 100644 --- a/versioned_docs/version-2.1/table-design/data-partitioning/auto-partitioning.md +++ b/versioned_docs/version-2.1/table-design/data-partitioning/auto-partitioning.md @@ -1,11 +1,11 @@ --- { - "title": "Auto partitioning", + "title": "Auto Partition", "language": "en" } --- - +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; This document mainly introduces table creation and data partitioning in Doris, as well as potential problems and solutions encountered during table creation operations. @@ -81,7 +83,7 @@ The following code sample introduces how to create tables in Apache Doris by RAN -- Range Partition CREATE TABLE IF NOT EXISTS example_range_tbl ( - `user_id` LARGEINT NOT NULL COMMENT "User ID", + `user_id` LARGEINT NOT NULL COMMENT "User ID", `date` DATE NOT NULL COMMENT "Date when the data are imported", `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", `city` VARCHAR(20) COMMENT "User location city", @@ -115,9 +117,178 @@ The default type of `ENGINE` is `OLAP`. Only OLAP is responsible for data manage `IF NOT EXISTS` indicates that if the table has not been created before, it will be created. Note that this only checks if the table name exists and does not check if the schema of the new table is the same as the schema of an existing table. Therefore, if there is a table with the same name but a different schema, this command will also return successfully, but it does not mean that a new table with a new schema has been created. +### Advanced Features and Examples + +Doris supports advanced data partitioning methods, including Dynamic Partition, Auto Partition, and Auto Bucket, which enable more flexible data management. The following are examples of implementations: + + + +

+ +[Auto Partition](./auto-partitioning) supports automatic creation of corresponding partitions according to user-defined rules during data import, which is more convenient. Rewrite the above example with Auto Range Partition as follows: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Using months as partition granularity +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1" +); +``` + +As above, when the data is imported, Doris will automatically create the corresponding partitions as `date` with a granularity of month level. `2018-12-01` and `2018-12-31` will fall into the same partition, while `2018-11-12` will fall into the leading partition. Auto Partition also supports List partition, please check Auto Partition's documentation for more usage. + +

+
+ + +

+ +[Dynamic Partition](./dynamic-partitioning) is an automatic partition creation and recovery management method based on the real time, rewrite the above example with dynamic partitioning as follows: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "WEEK", --- Partition granularity is week + "dynamic_partition.start" = "-2", --- Retain two weeks forward + "dynamic_partition.end" = "2", --- Two weeks after creation in advance + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +Dynamic partition supports tiered storage, customised copy number and other features, see the dynamic partition documentation for details. + +

+
+ + +

+ +:::tip +Supported since Doris 2.1.7 +::: + +Auto Partition and Dynamic Partition each have their own advantages, and combining the two enables flexible on-demand partition creation and automatic reclamation: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Using months as partition granularity +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "month", --- Both must have the same granularity + "dynamic_partition.start" = "-2" --- Dynamic Partition automatically cleans up partitions that are more than two weeks old + "dynamic_partition.end" = "0", --- Dynamic Partition does not create future partitions. it is left entirely to Auto Partition. + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +For detailed suggestions on this feature, see [Auto Partition Conjunct with Dynamic Partition](./auto-partitioning#conjunct-with-dynamic-partition)。 + +

+
+ + +

+ +When the user is not sure of a reasonable number of buckets, [Auto Bucket](./auto-bucket) for Doris to complete the estimation, and the user only needs to provide the estimated amount of table data: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +( + PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), + PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), + PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), + PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) +) +DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO +PROPERTIES +( + "replication_num" = "1", + "estimate_partition_size" = "2G" --- User estimate of the amount of data a partition will have, defaults to 10G if not provided +); +``` + +Note that this approach does not apply to cases where the amount of table data is particularly large. + +

+
+ +
+ ## View partitions -View the partiton information of a table by running the `show create table` command. +View the partiton information of a table by running the `show create table` command. ```sql > show create table example_range_tbl @@ -154,7 +325,7 @@ View the partiton information of a table by running the `show create table` com +-------------------+---------------------------------------------------------------------------------------------------------+ ``` -Or run the `show partitions from your_table` command. +Or run the `show partitions from your_table` command. ```sql > show partitions from example_range_tbl @@ -179,7 +350,39 @@ Or run the `show partitions from your_table` command. You can add a new partition by running the `alter table add partition ` command. ```sql -ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; +ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; ``` For more information about how to alter partitions, refer to [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION.md). + +## Partition Retrieval + +The `partitions` table function( supported since 2.1.5) and the `information_schema.partitions` system table( supported since 2.1.7) record partition information for the cluster. The partition information can be extracted from the corresponding table for use when automatically managing partitions: + +```sql +--- Find the partition with the corresponding value in the Auto Partition table. +mysql> select * from partitions("catalog"="internal", "database"="optest", "table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.30 sec) + +mysql> select * from information_schema.partitions where TABLE_SCHEMA='optest' and TABLE_NAME='list_table1' and PARTITION_NAME=auto_partition_name('list', null); ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | list_table1 | pX | NULL | 0 | 0 | LIST | NULL | str | NULL | (NULL) | 1 | 1266 | 1266 | 0 | 0 | 0 | 0 | 2024-11-14 19:58:45 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.24 sec) + +--- Find the partition that corresponds to the starting point +mysql> select * from information_schema.partitions where TABLE_NAME='DAILY_TRADE_VALUE' and PARTITION_DESCRIPTION like "[('2012-01-01'),%"; ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | DAILY_TRADE_VALUE | p20120101000000 | NULL | 0 | 0 | RANGE | NULL | TRADE_DATE | NULL | [('2012-01-01'), ('2013-01-01')) | 1 | 985 | 985 | 0 | 0 | 0 | 0 | 2024-11-14 17:29:02 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.65 sec) +``` diff --git a/versioned_docs/version-3.0/sql-manual/sql-functions/string-functions/auto-partition-name.md b/versioned_docs/version-3.0/sql-manual/sql-functions/string-functions/auto-partition-name.md index c9b1dd0421dc5..75f2a9cbb423b 100644 --- a/versioned_docs/version-3.0/sql-manual/sql-functions/string-functions/auto-partition-name.md +++ b/versioned_docs/version-3.0/sql-manual/sql-functions/string-functions/auto-partition-name.md @@ -25,7 +25,7 @@ under the License. --> ## auto_partition_name -### description +### Description #### Syntax `VARCHAR AUTO_PARTITION_NAME('RANGE', 'VARCHAR unit', DATETIME datetime)` @@ -40,9 +40,12 @@ The datetime parameter is a legal date expression. The unit parameter is the time interval you want, the available values are: [`second`, `minute`, `hour`, `day`, `month`, `year`]. If unit does not match one of these options, a syntax error will be returned. -### example -``` +**Supported since Doris 3.0.2** + +### Example + +```sql mysql> select auto_partition_name('range', 'years', '123'); ERROR 1105 (HY000): errCode = 2, detailMessage = range auto_partition_name must accept year|month|day|hour|minute|second for 2nd argument @@ -108,7 +111,6 @@ mysql> select auto_partition_name('list', "你好"); +------------------------------------+ | p4f60597d2 | +------------------------------------+ - ``` ### keywords diff --git a/versioned_docs/version-3.0/sql-manual/sql-functions/table-valued-functions/partitions.md b/versioned_docs/version-3.0/sql-manual/sql-functions/table-valued-functions/partitions.md index 7bda80d77e298..b5ffa054b6c0d 100644 --- a/versioned_docs/version-3.0/sql-manual/sql-functions/table-valued-functions/partitions.md +++ b/versioned_docs/version-3.0/sql-manual/sql-functions/table-valued-functions/partitions.md @@ -36,8 +36,6 @@ The table function generates a temporary partition TABLE, which allows you to vi This function is used in the from clause. -This function is supported since 2.1.5 - #### Syntax `partitions("catalog"="","database"="","table"="")` diff --git a/versioned_docs/version-3.0/table-design/data-partitioning/auto-partitioning.md b/versioned_docs/version-3.0/table-design/data-partitioning/auto-partitioning.md index 7146aac17cbc9..890e5436da9de 100644 --- a/versioned_docs/version-3.0/table-design/data-partitioning/auto-partitioning.md +++ b/versioned_docs/version-3.0/table-design/data-partitioning/auto-partitioning.md @@ -1,11 +1,11 @@ --- { - "title": "Auto partitioning", + "title": "Auto Partition", "language": "en" } --- - +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; This document mainly introduces table creation and data partitioning in Doris, as well as potential problems and solutions encountered during table creation operations. @@ -81,7 +83,7 @@ The following code sample introduces how to create tables in Apache Doris by RAN -- Range Partition CREATE TABLE IF NOT EXISTS example_range_tbl ( - `user_id` LARGEINT NOT NULL COMMENT "User ID", + `user_id` LARGEINT NOT NULL COMMENT "User ID", `date` DATE NOT NULL COMMENT "Date when the data are imported", `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", `city` VARCHAR(20) COMMENT "User location city", @@ -115,9 +117,174 @@ The default type of `ENGINE` is `OLAP`. Only OLAP is responsible for data manage `IF NOT EXISTS` indicates that if the table has not been created before, it will be created. Note that this only checks if the table name exists and does not check if the schema of the new table is the same as the schema of an existing table. Therefore, if there is a table with the same name but a different schema, this command will also return successfully, but it does not mean that a new table with a new schema has been created. +### Advanced Features and Examples + +Doris supports advanced data partitioning methods, including Dynamic Partition, Auto Partition, and Auto Bucket, which enable more flexible data management. The following are examples of implementations: + + + +

+ +[Auto Partition](./auto-partitioning) supports automatic creation of corresponding partitions according to user-defined rules during data import, which is more convenient. Rewrite the above example with Auto Range Partition as follows: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Using months as partition granularity +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1" +); +``` + +As above, when the data is imported, Doris will automatically create the corresponding partitions as `date` with a granularity of month level. `2018-12-01` and `2018-12-31` will fall into the same partition, while `2018-11-12` will fall into the leading partition. Auto Partition also supports List partition, please check Auto Partition's documentation for more usage. + +

+
+ + +

+ +[Dynamic Partition](./dynamic-partitioning) is an automatic partition creation and recovery management method based on the real time, rewrite the above example with dynamic partitioning as follows: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "WEEK", --- Partition granularity is week + "dynamic_partition.start" = "-2", --- Retain two weeks forward + "dynamic_partition.end" = "2", --- Two weeks after creation in advance + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +Dynamic partition supports tiered storage, customised copy number and other features, see the dynamic partition documentation for details. + +

+
+ + +

+ +Auto Partition and Dynamic Partition each have their own advantages, and combining the two enables flexible on-demand partition creation and automatic reclamation: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +AUTO PARTITION BY RANGE(date_trunc(`date`, 'month')) --- Using months as partition granularity +() +DISTRIBUTED BY HASH(`user_id`) BUCKETS 16 +PROPERTIES +( + "replication_num" = "1", + "dynamic_partition.enable" = "true", + "dynamic_partition.time_unit" = "month", --- Both must have the same granularity + "dynamic_partition.start" = "-2" --- Dynamic Partition automatically cleans up partitions that are more than two weeks old + "dynamic_partition.end" = "0", --- Dynamic Partition does not create future partitions. it is left entirely to Auto Partition. + "dynamic_partition.prefix" = "p", + "dynamic_partition.buckets" = "8" +); +``` + +For detailed suggestions on this feature, see [Auto Partition Conjunct with Dynamic Partition](./auto-partitioning#conjunct-with-dynamic-partition)。 + +

+
+ + +

+ +When the user is not sure of a reasonable number of buckets, [Auto Bucket](./auto-bucket) for Doris to complete the estimation, and the user only needs to provide the estimated amount of table data: + +```sql +CREATE TABLE IF NOT EXISTS example_range_tbl +( + `user_id` LARGEINT NOT NULL COMMENT "User ID", + `date` DATE NOT NULL COMMENT "Date when the data are imported", + `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported", + `city` VARCHAR(20) COMMENT "User location city", + `age` SMALLINT COMMENT "User age", + `sex` TINYINT COMMENT "User gender", + `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time", + `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption", + `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time", + `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" +) +ENGINE=OLAP +AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`) +PARTITION BY RANGE(`date`) +( + PARTITION `p201701` VALUES LESS THAN ("2017-02-01"), + PARTITION `p201702` VALUES LESS THAN ("2017-03-01"), + PARTITION `p201703` VALUES LESS THAN ("2017-04-01"), + PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01")) +) +DISTRIBUTED BY HASH(`user_id`) BUCKETS AUTO +PROPERTIES +( + "replication_num" = "1", + "estimate_partition_size" = "2G" --- User estimate of the amount of data a partition will have, defaults to 10G if not provided +); +``` + +Note that this approach does not apply to cases where the amount of table data is particularly large. + +

+
+ +
+ ## View partitions -View the partiton information of a table by running the `show create table` command. +View the partiton information of a table by running the `show create table` command. ```sql > show create table example_range_tbl @@ -154,7 +321,7 @@ View the partiton information of a table by running the `show create table` com +-------------------+---------------------------------------------------------------------------------------------------------+ ``` -Or run the `show partitions from your_table` command. +Or run the `show partitions from your_table` command. ```sql > show partitions from example_range_tbl @@ -179,7 +346,39 @@ Or run the `show partitions from your_table` command. You can add a new partition by running the `alter table add partition ` command. ```sql -ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; +ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5; ``` For more information about how to alter partitions, refer to [ALTER-TABLE-PARTITION](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION.md). + +## Partition Retrieval + +The `partitions` table function and the `information_schema.partitions` system table( supported since 3.0.2) record partition information for the cluster. The partition information can be extracted from the corresponding table for use when automatically managing partitions: + +```sql +--- Find the partition with the corresponding value in the Auto Partition table. +mysql> select * from partitions("catalog"="internal", "database"="optest", "table"="DAILY_TRADE_VALUE") where PartitionName = auto_partition_name('range', 'year', '2008-02-03'); ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +| 127095 | p20080101000000 | 2 | 2024-11-14 17:29:02 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2008-01-01]; ..types: [DATEV2]; keys: [2009-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | \N | 985.000 B | 0 | tag.location.default: 1 | 1 | 1 | \N | ++-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+-----------+------------+-------------------------+-----------+--------------------+--------------+ +1 row in set (0.30 sec) + +mysql> select * from information_schema.partitions where TABLE_SCHEMA='optest' and TABLE_NAME='list_table1' and PARTITION_NAME=auto_partition_name('list', null); ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | list_table1 | pX | NULL | 0 | 0 | LIST | NULL | str | NULL | (NULL) | 1 | 1266 | 1266 | 0 | 0 | 0 | 0 | 2024-11-14 19:58:45 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------+----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+-----------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.24 sec) + +--- Find the partition that corresponds to the starting point +mysql> select * from information_schema.partitions where TABLE_NAME='DAILY_TRADE_VALUE' and PARTITION_DESCRIPTION like "[('2012-01-01'),%"; ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | SUBPARTITION_NAME | PARTITION_ORDINAL_POSITION | SUBPARTITION_ORDINAL_POSITION | PARTITION_METHOD | SUBPARTITION_METHOD | PARTITION_EXPRESSION | SUBPARTITION_EXPRESSION | PARTITION_DESCRIPTION | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | CREATE_TIME | UPDATE_TIME | CHECK_TIME | CHECKSUM | PARTITION_COMMENT | NODEGROUP | TABLESPACE_NAME | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +| internal | optest | DAILY_TRADE_VALUE | p20120101000000 | NULL | 0 | 0 | RANGE | NULL | TRADE_DATE | NULL | [('2012-01-01'), ('2013-01-01')) | 1 | 985 | 985 | 0 | 0 | 0 | 0 | 2024-11-14 17:29:02 | 0000-00-00 00:00:00 | 0 | | | | ++---------------+--------------+-------------------+-----------------+-------------------+----------------------------+-------------------------------+------------------+---------------------+----------------------+-------------------------+----------------------------------+------------+----------------+-------------+-----------------+--------------+-----------+-------------+---------------------+---------------------+----------+-------------------+-----------+-----------------+ +1 row in set (0.65 sec) +```