There are standard hashing processors but they mostly work with flowfile's attribute or a whole content, while we are only interested in partial content data. This project is to build a custom processor to perform the task of hashing specific columns with favorited algorithms including: MD2, MD5, SHA224, SHA256 and SHA512
. As our particular purpose requires the outcome in csv format, then the csv output support is included in this project as well.
You can directly download the compiled output file HERE and test with your data flow. (put nar file in lib
folder of Nifi installed location, restart is required to get the imported processor showing up)
This is a list of additional libraries that not come along during generating nifi-processor Maven's archetype template.
avro
: from apache avro lirary to make the work with generic record easiercommons-csv
: from apache-common to work with csv format
Ensure your computer is installed the following:
-
Java 8 JDK
-
Maven
From the terminal, simply run below commands:
git clone https://github.com/vanducng/hashing-columns-nifi-processor.git cd hashing-columns-nifi-processor mvn clean install
The output is located at: .\hashing-columns-nifi-processor\nifi-HashColumn-nar\target\nifi-HashColumn-nar-1.0.nar
- CSV format can be configured with numerous properties such as value separator, record separator, quote character, escape charactor, etc.
- Add in JSON output format support since adding another conversion processor will impact the processing time of the whole data flow.
-
YOUTUBE LINK: Custom Processor Development with Apache NiFi: a very informative resource to get the sense of custom processor development.
-
APACHE.ORG: Official development document reference.
-
MEDIUM LINK: Build a first simple custom processor.
-
APACHE.ORG: Working with Apache Avro
-
BLOG LINK: Working with Apache Common CSV