Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support for Data Products (in a Data Mesh) #6526

Open
2 tasks done
juergenhemelt opened this issue May 11, 2022 · 4 comments
Open
2 tasks done

[Enhancement] Support for Data Products (in a Data Mesh) #6526

juergenhemelt opened this issue May 11, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request pinned Keep open (do not time out)

Comments

@juergenhemelt
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Please describe the new behavior that that will improve Egeria

The Egeria metamodel should include items for the description of Data Products in the context of a Data Mesh (https://martinfowler.com/articles/data-mesh-principles.html). There are existing developments and suggestions of how to do that. You can find some ideas here:

https://arnerossmann.github.io/post/2022-02-09_metadata-dataproduct/
https://github.com/agile-lab-dev/Data-Product-Specification

Alternatives

Using OpenMetadata (https://docs.open-metadata.org) instead of Egeria as suggested here https://github.com/agile-lab-dev/Data-Product-Specification

Any Further Information?

No response

Would you be prepared to be assigned this issue to work on?

  • I can work on this
@juergenhemelt juergenhemelt added enhancement New feature or request triage New bug/issue which needs checking & assigning labels May 11, 2022
@davidradl
Copy link
Member

@mandy-chessell @planetf1 fyi

@mandy-chessell
Copy link
Contributor

I am not sure how this is progressing but here are some thoughts ...

There are many description of data products made by different vendors and thought leaders. Some are focused on the technical implementation/deployment, others are more focused on the organizational/governance aspects of service level agreements/licensing/ownership aspects.

Each of these perspectives may be a valid focus for an organization at a particular point in time. Therefore I would propose that the data product is represented as a DataProduct classification that can be attached to any referenceable. This means it could be attached to a data set/API type asset, a server/container deployment or may be a more architectural/business construct that is attached to a solution component or digital service.

Over time as an organization refines their definition of a data product, the classification could be moved to a higher level concept to cover a more complete definiton of the data product.

I have just updated the descriptions of digital services, information supply chains and solution component in the Area 7 types description since that are relevant for the more complete view of a data product.

https://egeria-project.org/types/7/

@github-actions
Copy link

github-actions bot commented Aug 6, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Aug 6, 2022
@mandy-chessell mandy-chessell added pinned Keep open (do not time out) and removed no-issue-activity Issues automatically marked as stale because they have not had recent activity. labels Aug 6, 2022
@mandy-chessell
Copy link
Contributor

mandy-chessell commented Sep 26, 2022

Here are suggested mappings from data product concepts to Egeria's open metadata types:

Data Product concept Egeria open metadata types (with links)
Data Domain Data domains are represented by SubjectAreaDefinition entities. The SubjectArea classification is used to tag elements from the subject area.
Data Product Manager The data product manager role is typed by the DigitalServiceManager. They have the business ownership of a collection of related data products represented by a DigitalService. Data products are grouped under a single digital service when they make use of similar processing. For example, they may use the same data, but formated, scoped or processed differently with different licenses.
Data Product Each data product is identified by the DigitalProduct classification. The productType attribute can be used to identify the digital product as a data product.
Data Product Design The design of the data products' manufacturing and maintenance pipelines, along with the data products' storage and delivery mechanisms are represented by the digital service's SolutionBlueprint linked to SolutionComponents. The DigitalProduct classification is added to the solution components that represent the data product delivery capability.
Data Product Implementation The manufacturing/maintenance solution components are linked to the appropriate data pipeline Processes using the ImplementedBy relationship. The data product's delivery solution components are also linked to the delivery data assets via the ImplementedBy relationship.
Data Product Specification There are many types of information that make up the data product specification. Different organizations will make there own choices, but here are some options. They can be linked to the solution components or data assets depending on how specific the information is:
  • The schema of the data product, RootSchemaType, is attached to the data asset via the AssetSchemaType relationship.
  • The solution components, assets and data fields can be tagged using glossary terms, search keywords, security tags, reference data tags etc to make then easy to find and to explain what they contain.
  • The data products can be linked to a LicenseType using the License relationship. Terms and Conditions can be added to the LicenseType using the AttachedTermsAndConditions relationship.
  • Data profiling information can be attached to the assets as a DiscoveryAnalysisReport using the AssetDiscoveryReport relationship.
  • The ServiceLevelObjectives can be attached to the solution components or data assets using the GovernedBy relationship.
  • CertificationTypes can describe quality gates. They are attached to the data assets using the Certification relationship when the asset passes the quality tests.
  • DataProcessingPurposes can be attached to the solution components or data assets using the ApprovedDataPurposes relationships to show how the data in the data product can be processed.
  • A Connection is added to each data asset to identifiy the connector used to retrieve the data.
Data Product Subscription A subscriber (person, organization, system, ...) can register with the marketplace using a DigitalSubscription. The different products selected by the subscriber are attached to the digital subscription via the AgreementItem relationship. Terms and Conditions can be added to the DigitalSubscription using the AttachedTermsAndConditions relationship. Overrides to the terms and conditions can be added to the AgreementItem relationship.

@mandy-chessell mandy-chessell self-assigned this Oct 1, 2022
mandy-chessell added a commit to mandy-chessell/egeria that referenced this issue Oct 1, 2022
mandy-chessell added a commit that referenced this issue Oct 1, 2022
@planetf1 planetf1 mentioned this issue Nov 2, 2022
31 tasks
@planetf1 planetf1 removed the triage New bug/issue which needs checking & assigning label Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pinned Keep open (do not time out)
Projects
None yet
Development

No branches or pull requests

4 participants