Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Commit

Permalink
Updated READ.me, tweak to make Beam Window act as a Dummy
Browse files Browse the repository at this point in the history
  • Loading branch information
mattcasters committed Jan 8, 2019
1 parent 117e1ee commit 319d60f
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 27 deletions.
25 changes: 19 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,8 @@ Note you need the Pentaho settings.xml in your ~/.m2 : https://github.com/pentah

* Create a new directory called kettle-beam in \<PDI-DIR\>plugins/
* Copy target/kettle-beam-<version>.jar to \<PDI-DIR\>/plugins/kettle-beam/
* Copy the other jar files in target/lib to \<PDI-DIR\>/lib
* Copy the other jar files in target/lib to \<PDI-DIR\>/plugins/kettle-beam/lib/

I know it's dirty, fixing it later

## Configure

Expand All @@ -44,10 +43,24 @@ You can use the variables to make your transformations completely generic. For

## Supported

* Only straight line between Beam Input and Output is supported.
* Sort and Group by are not yet supported.
* Group By step : experimental, SUM (Integer, Number), COUNT
* External plugins are not yet supported.
* Input: Beam Input and GCP Pub/Sub Subscribe
* Output: Beam Output and GCP Pub/Sub Publish
* Windowing with the Beam Window step
* Sort rows is not yet supported and will never be supported in a generic sense.
* Group By step : experimental, SUM (Integer, Number), COUNT, MIN, MAX, FIRST (throws errors for not-supported stuff)
* Merge Join
* Stream Lookup (side loading data)
* Filter rows (including targeting steps for true/false)
* Switch/Case
* Plugin support through the Beam Job Configuration: specify which plugins to include in the runtime

## Runners
* Beam Direct : working
* Google Cloud DataFlow : working
* Apache Spark : mostly untested, configurable (feedback welcome)
* Apache Flink : not started yet, stubbed out code
* Aache Apex : not started yet, stubbed out code
* JStorm : not started yet

## More information

Expand Down
28 changes: 7 additions & 21 deletions src/main/java/org/kettle/beam/steps/window/BeamWindow.java
Original file line number Diff line number Diff line change
Expand Up @@ -8,30 +8,16 @@
import org.pentaho.di.trans.step.StepInterface;
import org.pentaho.di.trans.step.StepMeta;
import org.pentaho.di.trans.step.StepMetaInterface;
import org.pentaho.di.trans.steps.dummytrans.DummyTrans;

public class BeamWindow extends BaseStep implements StepInterface {
/** Behaves like a Dummy step, simply passing rows in the regular runner.
* In Beam this code isn't used. So the data passing in Dummy only is useful for unit testing and development in Spoon.
*
*/
public class BeamWindow extends DummyTrans implements StepInterface {

/**
* This is the base step that forms that basis for all steps. You can derive from this class to implement your own
* steps.
*
* @param stepMeta The StepMeta object to run.
* @param stepDataInterface the data object to store temporary data, database connections, caches, result sets,
* hashtables etc.
* @param copyNr The copynumber for this step.
* @param transMeta The TransInfo of which the step stepMeta is part of.
* @param trans The (running) transformation to obtain information shared among the steps.
*/
public BeamWindow( StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta,
Trans trans ) {
public BeamWindow( StepMeta stepMeta, StepDataInterface stepDataInterface, int copyNr, TransMeta transMeta, Trans trans ) {
super( stepMeta, stepDataInterface, copyNr, transMeta, trans );
}

@Override public boolean processRow( StepMetaInterface smi, StepDataInterface sdi ) throws KettleException {

// Outside of a Beam Runner this step doesn't actually do anything, it's just metadata
// This step gets converted into Beam API calls in a pipeline
//
return false;
}
}

0 comments on commit 319d60f

Please sign in to comment.