GitHub - shubhluck/data-purger: Utility tool which allows you to delete tables/files from HADOOP ecosystem older than certain number of days

Utility tool which allows you to delete tables/files from HADOOP ecosystem older than certain number of days

Command to run: SPARK_MAJOR_VERSION=2 spark-submit --class purge --master yarn --deploy-mode client --num-executors 1 --executor-cores 1 --executor-memory 4g --driver-memory 4g --driver-cores 1 --files --name DATA-PURGE datapurger_2.11-1.jar --keep-days <past number of days to keep data for> --hdfs-path <HDFS path of the DB on your cluster> --database <database name>

Example: With the below command we are deleting all tables which are older than 7 days located in /apps/hive/warehouse/sales.db/ under the sales database

SPARK_MAJOR_VERSION=2 spark-submit --class purge --master yarn --deploy-mode client --num-executors 1 --executor-cores 1 --executor-memory 4g --driver-memory 4g --driver-cores 1 --files /usr/hdp/current/spark2-client/conf/hive-site.xml --name DATA-PURGE datapurger_2.11-1.jar --keep-days 7 --hdfs-path /apps/hive/warehouse/sales.db/ --database sales

CAUTION: The code also allows another variation which should be used with caution, without passing the --database field, which just deletes files from the --hdfs-path based on the --keep-days. This does not clear the logical layer information which may be present over the files (table structure, would be stored in metastore as it is). Use this version of the command only if you're sure about what you're doing.

Example: With the below command we are deleting all files in /apps/hive/warehouse/sales.db/ which are older than 7 days without removing the tables which may be pointed to this HDFS location

SPARK_MAJOR_VERSION=2 spark-submit --class purge --master yarn --deploy-mode client --num-executors 1 --executor-cores 1 --executor-memory 4g --driver-memory 4g --driver-cores 1 --files /usr/hdp/current/spark2-client/conf/hive-site.xml --name DATA-PURGE datapurger_2.11-1.jar --keep-days 7 --hdfs-path /apps/hive/warehouse/sales.db/

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/main/scala		src/main/scala
target/scala-2.11		target/scala-2.11
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
plugins.sbt		plugins.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

shubhluck/data-purger

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages