Investigate possibility of PipelineDP API for Beam SQL #289
Labels
Type: New Feature ➕
Introduction of a completely new addition to the codebase
Type: Research 🔬
When further investigation into a subject is required
Context
PipleineDP supports anonymzation with Beam RDD API (example). It seems interesting to have the support of Beam SQL API.
Goal
To investigate and to design BeamSQL API for PipelineDP.
Possible example of PipelineDP BeamSQL API:
Note: This task consists for researching possible options (both API and implementation design) and proposing something that is useful for users and might be implemented reasonably simple.
Additional information
On PipelineDP Architecture
DPEngine
(code) class which implements Differential Private (DP) logic independently of the pipeline framework (now run with Apache Spark, Apache Beam and w/o framework is supported).DPEngine.aggregate() is the main method, which can perform any supported DP aggregations. Basically it's equivalent of running SQL query
where supported
dp_aggregate_function
are from the metric list.On implementation
The implementation will likely be parsing of SQL and calling of DPEngine.aggregate().
Open questions from Beam
The text was updated successfully, but these errors were encountered: