Resume bulkload on abort? #27

earthquakesan · 2017-10-01T12:22:54Z

Just had a situation, when the mapreduce job finished processing RDF and saving it to /tmp folder. However, my HBase server stopped working and ./bulkload reported error and quit.

This scenario is unlikely to happen in production cluster (redundant zookeeper + redundant hbase), however for local clusters it would nice to have ./bulkload split into two phases:

Mapreduce data to hbase tables (i.e. saving it to /tmp folder)
Load /tmp folder into hbase table "bla"

Right now, if I want to continue ./bulkload it simply throws stating that /tmp folder already exists.

asotona · 2017-10-01T12:59:23Z

Hi Ivan,yes, making the two parts of the bulk load separated by a command line switch or a separated command makes sense.However if you still have the h-files in the temp HDFS folder, you can bulk load them with following command: hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles <hdfs://storefileoutput> <tablename> Thanks,Adam -------- Původní zpráva --------Od: Ivan Ermilov <[email protected]> Datum: 01.10.17 14:22 (GMT+01:00) Komu: Merck/Halyard <[email protected]> Cc: Subscribed <[email protected]> Předmět: [Merck/Halyard] Resume bulkload on abort? (#27) Just had a situation, when the mapreduce job finished processing RDF and saving it to /tmp folder. However, my HBase server stopped working and ./bulkload reported error and quit. This scenario is unlikely to happen in production cluster (redundant zookeeper + redundant hbase), however for local clusters it would nice to have ./bulkload split into two phases: Mapreduce data to hbase tables (i.e. saving it to /tmp folder) Load /tmp folder into hbase table "bla" Right now, if I want to continue ./bulkload it simply throws stating that /tmp folder already exists. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/Merck/Halyard","title":"Merck/Halyard","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/Merck/Halyard"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Resume bulkload on abort? (#27)"}],"action":{"name":"View Issue","url":"#27"}}}

asotona added the enhancement label Jan 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resume bulkload on abort? #27

Resume bulkload on abort? #27

earthquakesan commented Oct 1, 2017

asotona commented Oct 1, 2017 via email

Resume bulkload on abort? #27

Resume bulkload on abort? #27

Comments

earthquakesan commented Oct 1, 2017

asotona commented Oct 1, 2017 via email