Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS Ingress and Egress #172

Open
wants to merge 37 commits into
base: release
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
dc638eb
mrc
JEJodesty May 17, 2024
2febeba
cod node update complete
JEJodesty Jun 4, 2024
4671c33
ingress and egress accecable action plane w/out integration point
JEJodesty Jun 5, 2024
e1882d8
ingress and egress accecable action plane w/ integration point
JEJodesty Jun 6, 2024
7d5048c
added i/o writing
JEJodesty Jun 11, 2024
06acd09
change readme
JEJodesty Jun 11, 2024
08b260e
add data product team
JEJodesty Jun 11, 2024
6dc3d9c
add data product team example
JEJodesty Jun 11, 2024
8d7f92c
add data product team example
JEJodesty Jun 11, 2024
107efa2
repl in mesh client
JEJodesty Jun 12, 2024
34f1f6c
estabblished InfraStructure Sub Component
JEJodesty Jun 14, 2024
a808ead
estabblished InfraStructure Sub Component
JEJodesty Jun 14, 2024
d77a709
ipfs included as class
JEJodesty Jun 17, 2024
088bef4
infrafunction composes Processor & Plant and Infrstructure seperate
JEJodesty Jun 19, 2024
fc63711
infrafunction composes Processor & Plant and Infrstructure seperate
JEJodesty Jun 19, 2024
976a65f
partial cid filtering for bom
JEJodesty Jun 20, 2024
47f9481
v4
JEJodesty Jun 25, 2024
92a1c96
removed cod
JEJodesty Jun 26, 2024
6129bb0
added catlog
JEJodesty Jun 26, 2024
4bd6247
added commit history
JEJodesty Jun 26, 2024
077eade
updated weekly summary
JEJodesty Jun 26, 2024
0f1a9bd
updated catlog
JEJodesty Jun 28, 2024
309ae9f
ACG Monad & MAC CATs logs
JEJodesty Jul 1, 2024
8e09462
moved articles to wiki
JEJodesty Jul 2, 2024
339d411
reformated log and articles
JEJodesty Jul 2, 2024
0b16ccd
docker workd
JEJodesty Jul 6, 2024
bdd110f
car close
JEJodesty Sep 4, 2024
c2548d3
works on macos 12.2.1
JEJodesty Sep 5, 2024
0d0f63e
plant in order lifted to structure generalization
JEJodesty Sep 10, 2024
d526fcf
plant in order lifted to structure generalization
JEJodesty Sep 10, 2024
6abef03
refactor
JEJodesty Sep 10, 2024
1405b15
updated docs
JEJodesty Sep 11, 2024
0490aec
removed kind cluster deletion within main.tf & updated Docker to vers…
JEJodesty Sep 12, 2024
fc14f72
removed kind cluster deletion from main.tf & updated test and docker …
JEJodesty Sep 12, 2024
fe44c60
updated infrastructure with docker resource
JEJodesty Sep 16, 2024
8e0c1af
Adding empty directory with .gitkeep
JEJodesty Oct 21, 2024
b90d496
Adding empty directory with .gitkeep
JEJodesty Oct 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 8 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,11 @@ invoice.json
order.json
cat-action-plane-config
old/experiments/catMesh
online
requirements.txt
data/online
data/cache/
data/jobs/
cats/*/new.py
offline
data/input/.terraform/
data/input/plant/.terraform/
data/input/structure/.terraform
2 changes: 2 additions & 0 deletions .gitkeep
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
data/input/structure/outputs/data_egress
data/testing
72 changes: 53 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,26 @@
![alt_text](images/CATs_chaordic_kernel.jpeg)

## Description:
**Content-Addressable Transformers** (**CATs**) is a unified Data Service Collaboration framework for organizations.
CATs connect collaborators between organizations on a Data Mesh with interoperable parallelized and distributed
computing at horizontal & vertical scale. CATs' establish a scalable and self-serviced Data Platform as a Data Mesh
network of scalable and interoperable distributed computing workloads with Data Provenance deployable on Kubernetes.
These workloads [CAT(s)] enable for Big Data processing with Scientific Computing capabilities. CATs are integration
**Content-Addressable Transformers** (**CATs**) is a unified Data Service Collaboration framework for organizations
implemented as an edge-computing service that establish a Data Mesh as a scalable self-serviced Data Platform of
Data Products with Data Provenance. CATs connect collaborators between organizations on a Data Mesh via the
Content-Addressed Storage(CAS) of interoperable and scalable data processing to enable Data Provenance. CAT data
processing workloads (CATs) are deployable as parallelized and distributed processes at horizontal & vertical scale to
support scalable (big) data processing microservices with Scientific Computing capabilities. CATs are also integration
points which enable scaled data processing portability between client-server cloud platforms and mesh (p2p) networks
with minimal rework or modification.
with minimal rework or modification.

CATs are submitted as content-addressed Orders of data processes (transformers) which are Invoiced for verification and
logged as Bills-Of-Materials that serve as Data Provenance records. These records are content-addressed as unique
identifiers of CAT workloads and their content. CATs content-addresses are also used as URIs that provide a means of
data transportation. Therefore, the implementation of CATs' as content-addressed data processes establishes and
self-services a scalable Data Platform as a Data Mesh network of interoperable distributed computing workloads
deployable on Kubernetes as CATs execution paradigm.

CATs enables the
[continuous reification of **Data Initiatives**](https://github.com/BlockScience/cats?tab=readme-ov-file#continuous-data-initiative-reification)
by cataloging discoverable, accessable, and re-executable workloads as
[**Data Service Collaboration**](https://github.com/BlockScience/cats?tab=readme-ov-file#continuous-data-initiative-reification)
by cataloging discoverable, accessible, and re-executable workloads as
[**Data Service Collaboration**](https://github.com/BlockScience/cats?tab=readme-ov-file#continuous-data-initiative-reification)
composable records between organizations. These records provide a reliable and efficient way to manage, share, and
reference data processes via [**Content-Addressing**](https://en.wikipedia.org/wiki/Content-addressable_storage) Data
Provenance records.
Expand All @@ -34,12 +42,12 @@ Machine Learning, and AI. Ray provides CATs with interoperable computing framewo
[ecosystem integrations](https://docs.ray.io/en/latest/ray-overview/ray-libraries.html) such as
[Apache Spark](https://spark.apache.org/), and [PyTorch](https://pytorch.org/).

Ray is deployed as an execution middleware on top of [Bacalhau’s](https://www.bacalhau.org/) [Compute Over Data (CoD)](https://github.com/bacalhau-project/bacalhau).
CoD enables IPFS to serve as CATs' Data Mesh's network layer to provide parallelized data ingress and egress for IPFS
data. This portability closes the gap between data analysis and business operations by connecting the network planes of
the cloud service model (SaaS, PaaS, IaaS) with IPFS. CATs connect these network planes by enabling the instantiation of
FaaS with cloud services in AWS, GCP, Azure, etc. on a **Data Mesh** network of CATs. CoD enables this connection as p2p
distributed-computing job submission in addition to the client-server job submission provided by Ray.
Ray is deployed as an execution middleware on Kubernetes. IPFS serves as CATs' Data Mesh's network layer to provide
parallelized data ingress and egress for IPFS data. This network portability closes the gap between data analysis and
business operations by connecting the network planes of the cloud service model (SaaS, PaaS, IaaS) with IPFS. CATs
connect these network planes by enabling the instantiation of FaaS with cloud services in AWS, GCP, Azure, etc. on a
**Data Mesh** network of CATs. IPFS enables this connection as p2p distributed-computing job submission in addition to
the client-server job submission provided by Ray.
![alt_text](images/simple_CAT2b.jpeg)

### Get Started!:
Expand All @@ -59,19 +67,45 @@ distributed-computing job submission in addition to the client-server job submis

### [Contribute!](docs/CONTRIBUTING.md)

### Continuous Data Initiative Reification:
**Data Initiatives** will be naturally reified as a result of **Data Service Collaboration** on CATs. CATs will be
compiled and executed as interconnecting services on a Data Mesh that grows naturally when organizations communicate
CATs provenance records within feedback loops of Data Initiatives.
![alt_text](images/CATs_bom_ag.jpeg)

### CATs' Architectural Quantum:
Organizations and collaborators participating will employ CATs for rapid ratification of service agreements within
collaborative feedback loops of [**Data Initiatives**](https://github.com/BlockScience/cats?tab=readme-ov-file#continuous-data-initiative).
CATs' apply an **Architectural Quantum** Domain-Driven Design principle described in
[**Data Mesh of Data Products**](https://martinfowler.com/articles/data-mesh-principles.html) to reify Data Initiatives.
(* [**Design Description**](docs/DESIGN.md))

The Action Plane is the Analytical Data Processing interface. The Action Plane orchestrates and supervises
how virtual resources owned by the Data Product should be managed, routed, and processed and is stored “offmesh”
(“offline”). It supervises the exchange of data between sub-Process components within the Data sub-Plane (Process) in
adherence to Data Contracting Standards of organizations participating in a Data Mesh.
![alt_text](images/CATkernel.jpeg)

### Continuous Data Initiative Reification:
**Data Initiatives** will be naturally reified as a result of **Data Service Collaboration** on CATs. CATs will be
compiled and executed as interconnecting services on a Data Mesh that grows naturally when organizations communicate
CATs provenance records within feedback loops of Data Initiatives.
![alt_text](images/CATs_bom_ag.jpeg)
#### Quantum Architecture Description as a [Minimal Federated Operating Model](https://www.starburst.io/blog/data-mesh-book-bulletin-principle-of-federated-computational-governance/)
* **Function** is a FaaS for scalable Data Processing and analytics executed as CAT **Processes**. Functions (FaaS) are deployed
on Structure (PaaS) to execute Processes orchestrated by InfraFunctions (FaaS)
* **Processes** are **Functional Data Processors** executable by InfraFunctions (FaaS) deployed on Structure (PaaS), and
contextualized with pre and post processed data by InfraFunctions (FaaS). Processes (FaaS) are executed with and made
orchestratable by InfraFunctions (FaaS) to support the following use-cases
* The CAT Order is updated with the inclusion of resulting mutated Functions (FaaS) for execution processed by CATs
Factory Client.
* **InfraFunction (FaaS) is a Data Processing orchestrator** that employs a CAR for the configurable execution of scalable
**Process**ing operated by the Plant (SaaS)
* The CAT Order is updated in alignment CATs Architectural Quantum’s Functionality. This Order will include the
resulting updated of Structure (PaaS) with respect to the updated Plant (SaaS) and an updated Function (FaaS) with
updated Ingress and Egress subProcesses (FaaS)
* **Structure** (**PaaS** as **IaC**) provisions and maintains the Plant (SaaS) as Function’s (FaaS) scalable execution environment.
* The **Plant (SaaS)** is **InfraStructure’s (IaaS)** dynamically scaled execution environment of **Function (FaaS)**
as an IaC plugin(s)
* The web application codebase is Content Addressed within CAT Orders as Data Contract metadata for Order registration.
* **InfraStructure (IaaS)** supports the provisioning of dynamically scaled infrastructure for maintaining a Plant (SaaS).
* The CAT Order is updated in alignment with event-driven functionality and operations with the resulting mutation
of Structure (PaaS).

### CATs' Data Provenance Record:
**BOM (Bill of Materials)** are CATs' Content-Addressed Data Provenance record for verifiable data processing and
Expand Down
147 changes: 147 additions & 0 deletions articles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Articles

## February:

### Week 6 (2/12 - 2/16):

#### What does a CATs data contract do?

Data Contract is a Service agreement between producer and consumer with attribute dependencies for downstream Data
Product evolution with dedicated lineage. A data contracts can provide tools for collaboration on data requirements as
product promises within a shared context that inform policies for contract mutation along side Data Product releases.

A Data Contract’s Product Promises are what the data product owners expect from its data consumer up to the latest block
of information. These promises may include data quality, data usage terms and conditions, schema, service-objectives,
billing, etc. Data Contract policy mutation cascaded downstream as bilateral lateral agreements that “forks” lineage as
a new Data Product version. For Example, the consumer takes the risk of violating privacy. Data Producers create Data
Contracts on Organization and Business Terms. The consumer of the Data Contract enforces Governance policies. The
producer of the Data Contract owns the Data Product if the organization doesn't have a Governance body.

Governance policies are discussed between data producers and consumers to agree upon data producer requirements. These
discussions should culminate into an amenable data structure / dataset. Structured data is conducive for pre-exsisting
policies and less discussion. Less structured data will need more discussion and policy feedback loops. We need a Minimal
Viable Data Contract that includes what is necessary for an organization to govern with the means of supporting policy
feedback loops in a way that guides discussion in a way that balances the prioritization of outcomes and methodologies.

Interdependent data domains have sub-domains with identifiers for generating Data Products. CAT Nodes will generate and
execute Virtual Data Products composed as Data Contracts that enforce Data Provenance using Bills of Materials (BOMs).
BOMs are CATs' Content-Addressed Data Provenance record for verifiable data processing and transport on a Mesh network
of CAT Nodes. Data Contracts will contain a BOMs lineages and act as block headers for Content-Addressed Transformers
(CATs) instances. Data Products are mutated during policy feedback loops informed collaborators communicating their
understanding of knowledge domains. Collaborators will identify knowledge sub-domains with references and will access
sub-domains using Content-Addresses. Access is federated via knowledge domain hierarchies in abstractions that enable
collaborators to participate in governance cycles by leveraging their understanding of knowledge.

### Week 7 (2/19 - 2/23):

#### What is a Content-Addressed Data Asset (CADA)?

CATs Data Products will consist of Data Contracts with provenance as executable BOMs lineages and act as block headers
for Content-Addressed Transformers (CATs) instances that contain Data Assets. BOMs are CATs' Content-Addressed Data
Provenance record for verifiable data processing and transport on a Mesh network of CAT Nodes that can contain Data
Assets. A data asset may be a system or application output” (dataset) that holds value for an organization or individual
that is accessible. Data Assets’ value can derive from the data's potential for generating insights, informing
decision-making, contributing to product development, enhancing operational efficiency, or creating economic benefits
through its sale or exchange.

CATs' Content-Addressed Data Assets are processed, sold / exchanged / published on CAT’s Data Mesh via CAT Nodes
subsumed by downstream CATs’ Data Products. Data Assets consist of the following:
* **Data Domains** - "A predefined or user-defined Model repository object that represents the functional meaning of an"
attribute "based on column data or column name such as" account identification.
* **Data Objects** - Content-Addresses of data sources used to extract metadata for analysis.

## March:

### Week 8 (2/26 - 3/1):

#### What makes CATs Governable by including BOMs within Data Product’s Data Contracts?

CATs are governable and support multi-disciplinary collaboration of data processing because CATs Architectural Quantum
is an abstract governance model enforced within CATs’ Bills-Of-Materials (BOMs) for which knowledge domains are
represented as meta-data of data provenance records to support domain ownership.

BOMs are unique identifiers that provide the means of data production (assembly) and transportation as reproducible
lineage contextualised by knowledge domains for federated governance. BOMs consist of Data Product service Orders of
data processing that are Invoiced as fulfillments of service agreements specified by Data Product’s Data Contracts

Federated Governance is enabled by BOMs due the following. The domain specific data provenance BOMs establish the
legitimacy of network policy changes suggested by Fractional Stewards of Data Products by enabling them to identify data
quality issues at their source on a self-serviced Data Platform of many Data Products.

CATs enables Fractional Stewards to do this because historical data production is contextualised and reproducible within
the scope of their knowledge domains by design during development and production as a requirement of a service Order.
CATs data processes submitted by their service Orders are Invoiced to fulfil agreements within Data Products’ Data
Contracts.

A Data Contract is a Service agreement between producer and consumer with attribute dependencies for downstream Data
Product evolution with dedicated lineage. Governance policy discussions between data producers and consumers in policy
feedback loops about data production requirements should balance the prioritization of outcomes and methodologies should
culminate into an amenable data structure / dataset.

### Week 9 (3/4 - 3/8):

#### “Data as an asset” enables the consumption, production, prosumption of Data Assets on CATs Data Mesh

“Data as an asset” [0.](https://atlan.com/data-as-an-asset/) conceptually emphasizes recognizing and treating data as a
strategic investment organizations can leverage to deliver future economic benefits by enabling the consumption,
production, [prosumption](https://en.wikipedia.org/wiki/Prosumer) of ones own data as an asset. Prosumption is the consumption and production of value, "either
for self-consumption or consumption by others, and can receive implicit or explicit incentives from organizations
involved in the exchange." [1.](https://doi.org/10.1108/JOSM-05-2020-0155)

The availability of high-quality and domain-specified Data Assets enables Data Products on inter-connected CAT Nodes on
CATs Data Mesh to facilitate cross-functional asset utilization within Data Initiatives in a way that support Data
Sovereignty. "Data sovereignty refers to a group or individual’s right to control and maintain their own data, which
includes the collection, storage, and interpretation of data." [2.](https://www.nnlm.gov/guides/data-glossary/data-sovereignty#:~:text=Definition,storage%2C%20and%20interpretation%20of%20data.)

Registering and cataloging CATs can accelerate innovative Data Product creation and facilitate Data Sovereignty in Data
Initiatives that discover and utilize “Data as an asset”. Data Products use and operate CAT Nodes to produce, register,
and catalog “Data as an asset” as searchable and discoverable Data Assets by Data Products on CATs Data Mesh. CATs Data
Assets enhances strategic, operational, and analysis informed decision-making by using BOMs as feedback loop mechanisms
across domains in a way that suits specific collaborative contexts across organizations.

### Week 11 (3/18 - 3/22):

#### Why should Data Contracts be included in CATs' BOMs for Data Product development on a Data Mesh?

Data Product(s) CATs are executed by Data Contract deployments with Data Provenance by Ordering CATs that are Invoiced
within Bills of Materials (BOMs). BOMs are CATs' Content-Addressed Data Provenance record for verifiable data processing
and transport on CAT Mesh. Data Contracts will contain BOM lineages and act as headers for Content-Addressed Transformer
instances (CATs). Their inclusion of BOMs are necessary for organizations to rapidly mutate Data Products alongside
discussions that affect product outcomes and development methodologies.

Data Products are mutated during stakeholder discussions about Data Contracts with respect to network policy / protocol.
These discussions continuously inform multi-lateral Data Product agreements between stakeholders and collaborators that
produce and consume data using BOMs as feedback loop mechanisms for (re)submitting CAT Orders. These discussions should
also culminate into a CAT Order of amenable data structures / datasets for which processing is Invoiced within BOMs.
Collaborators can participate in data provenance supported product development by Content-Addressing Data as an Asset.

## June:

### Week 24 (6/24 - 6/28):

#### What is the Architectural purpose of CATs as a function?

* **Governance Plane: z(t)**
* is for the Stewardship of a Data Product Supply Network of CATs represented as a Directed Acyclic Graph of Data Product Supply
* **Control Plane: y(t)**
* is for the Networking of what is Produced as a result of Science & Engineering CATs
* **Action Plane: x(t)**
* is for the Science & Engineering of Data Transformation as Computational Processing, a.k.a. CATs

#### Multi-Agent Collaboration (MAC) for CATs using Content-Addressable Router (CAR)

* _Design Description_
* CATs and LangGraphs integration can enable a row wise business function as a Chart Tool of Multi-Agent Collaboration
(MAC) if CAT Orders act as a Transfer (Network) Function implemented as an OOP Command Pattern for which CATs
Ingress and Egress sub-processes can be executed by CATs’ Content-Addressable Router (CAR).
* Architectural Considerations: CATs can inform business decisions given the following:
* Action Plane: x(t)
* CAT Functions can be defined as LangGraph Call Tools executed by LangGraphs Tool Node
* CAT Factory produces CAT Executors integrated with LangGraphs Tool Executor.
* Control Plane: y(t) \[aka Content-Addressable Router (CAR)\]
* CAR integrated with LangGraphs Router.
* cadCAD (Network) Policies aka “Algorithmic Suggestions” can be deployed on LangGraphs Agent Nodes with specified
Domain-Name references as Rule Asset RIDs
* Governance Plane: z(t)
* A GreyBox Model for as a feature parameterized Tensor Field with process variable (PV) as label
* The business function is a CATs Control & Action Matrix - a 2 dimensional representation of 3 dimensional space
Loading