Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements #6

Merged
merged 43 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
2ca03a6
Adding fusion broadcast option
JLaborda Jun 12, 2023
d19e3d0
Added pair based fusion broadcasting
JLaborda Nov 27, 2023
a71281e
Merge remote-tracking branch 'origin/featurebroadcasting' into featur…
JLaborda Nov 27, 2023
7445dd1
Adding BES stage to the PairCombinedFusion
JLaborda Nov 27, 2023
075c23d
Deleting original fes and fs commented methods
JLaborda Nov 27, 2023
a7b48ff
Deleting original bs method
JLaborda Nov 27, 2023
6fc471b
Updating tests to increase coverage
JLaborda Nov 28, 2023
ce4634e
Adding PairCombinedFusionTest
JLaborda Nov 28, 2023
1b7d1d7
Cleaning code for code coverage
JLaborda Nov 28, 2023
7fb39a4
Cleaning project: grammar, typos, imports...
JLaborda Nov 30, 2023
e1bdb13
Adding SHD tests and more cleanup
JLaborda Nov 30, 2023
225b4d3
Adding pigs network and datasets
JLaborda Nov 30, 2023
babcc52
Adding experiment scripts
JLaborda Dec 2, 2023
0c9b5a4
Adding changes for experiments
JLaborda Dec 2, 2023
50bae28
Adding experiment launch changes
JLaborda Dec 4, 2023
10a7516
Adding changes for experiments
JLaborda Dec 5, 2023
07247c2
Saving time in seconds
JLaborda Dec 11, 2023
802f1ae
Moving scritps to folder
JLaborda Dec 11, 2023
6f17d06
Adding .idea folder and cov.xml to .gitignore
JLaborda Dec 11, 2023
6e68032
Analysis and exreport of broadcasting experiments
JLaborda Dec 11, 2023
99192be
Solving test issues
JLaborda Dec 11, 2023
e66d9f8
Analysis of broadcasting results
JLaborda Dec 12, 2023
fc58148
Adding Best and Random Broadcasting
JLaborda Dec 12, 2023
dcc9b63
Adding test for random and best searches
JLaborda Dec 12, 2023
d9c8fda
Adding shuffleTest
JLaborda Dec 12, 2023
c02630e
Adding verbose option to project
JLaborda Dec 13, 2023
124bef7
Solving saveing bug
JLaborda Dec 13, 2023
19334df
Analysis of broadcasting experiments
JLaborda Dec 19, 2023
b923395
Changing PairCombinedFusion to check complexity
JLaborda Dec 20, 2023
29b2aa8
Updating bestBroadcastSearch
JLaborda Jan 9, 2024
6ab0987
Adding setters and getters to HierarchicalClustering
JLaborda Feb 9, 2024
dd630d0
Updating CGES Best_Broadcasting to avoid retroalimentation
JLaborda Feb 9, 2024
a6c1e9a
Updating random and best broadcasting
JLaborda Feb 13, 2024
9bfac5e
Changing BNBuilder and CGES constructors
JLaborda Feb 13, 2024
599c503
Adding more networks and datasets
JLaborda Feb 15, 2024
079f7bf
Updating experiment scripts
JLaborda Feb 15, 2024
b7c3b66
Adding new score and time measurements
JLaborda Feb 15, 2024
4ab0f04
Adding changes for experiments
JLaborda Mar 28, 2024
6f8718c
Solving interleaving issue
JLaborda Mar 28, 2024
cafc581
Adding iterations limit and checkExperiment
JLaborda Apr 22, 2024
43f1ed4
Added auto-save, time limit and iterations save
JLaborda Jul 12, 2024
85d4ae3
Adding example-params.txt
JLaborda Jul 12, 2024
9e3bb57
Update README.md
JLaborda Jul 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/workflows/CI-CD-pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
cache: maven
- name: Build with Maven
run: mvn clean verify --batch-mode
# Codecoverage
# Code-coverage
- name: Install dependencies
run: mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V
- name: Run tests and collect coverage
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/maven-publish.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This workflow will build a package using Maven and then publish it to GitHub packages when a release is created
# For more information see: https://github.com/actions/setup-java/blob/main/docs/advanced-usage.md#apache-maven-with-a-settings-path

name: Maven Package
name: Publish Package

on:
release:
Expand Down
26 changes: 26 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,29 @@ replay_pid*
/res/networks/others/
/scripts/
/results/

# Experiment outputs and errors
*.e*
*.o*

# target folder
/target/*

# Code coverage
cov.xml

# .idea folder
.idea/

#.DS_Store
.DS_Store

# large_datasets
res/large_datasets/

# parameters folder
res/parameters/

#scripts
res/scripts/
!res/scripts/experiments/
79 changes: 57 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
[![codecov](https://codecov.io/gh/JLaborda/cges/branch/main/graph/badge.svg?token=C9GeO49RsE)](https://codecov.io/gh/JLaborda/cges)

# CGES
Circular Greedy Equivalence Search (CGES) is a distributed structural learning algorithm for Bayesian Networks developed by Jorge Daniel Laborda, Pablo Torrijos, José M. Puerta and José A. Gámez.
Circular/Ring Greedy Equivalence Search (CGES or rGES) is a distributed structural learning algorithm for Bayesian Networks developed by Jorge Daniel Laborda, Pablo Torrijos, José M. Puerta and José A. Gámez.
This repository contains the code implementation of the algorithm described in the research article titled [A Ring-Based Distributed Algorithm for
Learning High-Dimensional Bayesian Networks]. The algorithm focuses on structural learning of Bayesian Networks in high-dimensional domains, aiming to reduce complexity and improve efficiency. The algorithm is limited to discrete problems.
Learning High-Dimensional Bayesian Networks](https://link.springer.com/chapter/10.1007/978-3-031-45608-4_10). This algorithm focuses on structural learning of Bayesian Networks in high-dimensional domains, aiming to reduce complexity and improve efficiency. It is limited to discrete problems.

## Table of Contents
- [Introduction](#introduction)
Expand All @@ -17,8 +17,18 @@ Learning High-Dimensional Bayesian Networks]. The algorithm focuses on structura

## Introduction
In this research project, we propose an algorithm, named cGES, for learning Bayesian Networks in high-dimensional domains. The algorithm utilizes a divide-and-conquer approach, parallelism, and fusion techniques to address the challenges associated with structural learning in high-dimensional datasets. The code in this repository implements the cGES algorithm and provides a practical tool for researchers and practitioners interested in Bayesian Network learning.

![Figura-cges-mejorado](https://github.com/JLaborda/cges/assets/15078416/5c16635d-3ef2-4f46-bb87-4c6863f24cc6)

We have added other algorithms that follow a star topology into the project named. We've named these algorithms Star Greedy Equivalence Search (sGES). The algorithms designed with this topology are the following:
* Random Broadcasting (srGES): The input connections between processes are determined randomly at the end of each iteration. In other words, the DAGs of each process are randomly selected for input for each process.
* Best Broadcasting (sbGES): The best DAG of the iteration is passed as input to each process.
In both scenarios, we avoid self-feedback by prohibiting a process output from being its input in the next iteration. The following figure shows the star topology structure:

![cges-star](https://github.com/user-attachments/assets/c72f8a5f-4a16-47b4-9b78-38612f6568d3)

We have also tested with other broadcasting, but they are much less efficient.

## Requirements
- [Java 8](https://www.oracle.com/java/technologies/java8.html)
- [Tetrad 7.1.2-2](https://github.com/cmu-phil/tetrad) (Provided in this repository)
Expand All @@ -37,38 +47,63 @@ docker build -t cges .
```

## Usage
1. [Instructions for how to use the code]
2. [Description of input/output formats]
The parameters you need to provide to either the jar file, or to the docker container are:
1. The path to the file with the parameters you want your experiments to execute.
2. The index (number of line - 1) of the file for which the experiment will be executed.
3. (Optional)
The parameter file needs to have the following information separated by a blank space in each line:
1. The path to the file with the parameters you want your experiments to execute.
2. The index (number of line - 1) of the file for which the experiment will be executed.
3. (Optional) The parameter file needs to have the following information separated by a blank space in each line:

```
algorithm_name net_name net_path dataset_path number_cges_threads edge_limitation random_seed
```
You have at your disposal a file of parameters for the networks andes, link and munin in the './res/parameters/' folder. Feel free to modify it as you wish to run any experiment you want.
A line in the parameter file will have this format:
```
algName 'value' netName 'value' clusteringName 'value' numberOfClusters 'value' broadcasting 'value' databasePath 'value' netPath 'value' seed 'value'
```
The seed is only used in the random broadcasting setup. There is no need to add blank values for parameters that are not used.

You can run any experiment by using these sentences and
```
java -jar [jar-file-with-dependencies] [parameters-file] [index-of-file] [result_path](optional)
```
If you wish to use the docker container, use the following:
```
docker run [cges_container_name] [parameters-file] [index-of-file] [result_path](optional)
```
Here is an example of a line of a valid params file to run a sbGES algorithm:
```
algName cges netName alarm clusteringName HierarchicalClustering numberOfClusters 2 broadcasting BEST_BROADCASTING databasePath ./res/datasets/alarm/alarm1.csv netPath ./res/networks/alarm/alarm.xbif
```

Another example to run a srGES algorithm:
```
algName cges netName alarm clusteringName HierarchicalClustering numberOfClusters 4 broadcasting RANDOM_BROADCASTING seed 11 databasePath ./res/datasets/alarm/alarm1.csv netPath ./res/networks/alarm/alarm.xbif
```

Here is an example of a params line to execute a control algorithm like GES:
```
algName ges netName alarm databasePath ./res/datasets/alarm/alarm2.csv netPath ./res/networks/alarm/alarm.xbif
```

The allowed values of each parameter are:
* algName: [cges, ges, fges, fges-faithfulness]. Use cges for all the new algorithms in this project.
* netName: [The name of the network].
* clusteringName: [HierarchicalClustering, RandomClustering]. We recommend that you use with HierarchicalClustering.
* numberOfClusters: [Any number, preferable even]. We suggest sticking to the following numbers [2,4,8,16].
* broadcasting: [NO_BROADCASTING, RANDOM_BROADCASTING, BEST_BROADCASTING]. NO_BROADCASTING is for the rGES or cGES algorithm. RANDOM_BROADCASTING is for the srGES. BEST_BROADCASTING is for sbGES.
* seed: (optional) Any number. It's only used in RANDOM_BROADCASTING.
* databasePath: The local path of the data you are using.
* netPath: The local path of the original bayesian network you used to sample the data in format xbif.

You have a file of parameters in './example-params.txt' as an example. Feel free to modify it to run any experiment you want.

You can run any experiment by using these sentences and
```
java -jar [jar-file-with-dependencies] [parameters-file] [index-of-file] [result_path](optional)
```
If you wish to use the docker container, use the following:
```
docker run [cges_container_name] [parameters-file] [index-of-file] [result_path](optional)
```

## Example
**Package**
```
mvn package
java -jar target/CGES-1.0-jar-with-dependencies.jar ./res/parameters/andes_parameters.txt 2 ./MyResults.txt
java -jar target/CGES-1.0-jar-with-dependencies.jar ./example-params.txt 2 ./MyResults.txt
```
**Docker Container**
```
docker build -t cges .
docker run -v $(pwd)/res:/res -v $(pwd)/results:/results --rm cges /res/parameters/andes_parameters.txt 2 results/myResults.csv
docker run -v $(pwd)/res:/res -v $(pwd)/results:/results --rm cges ./example-params.txt 2 results/myResults.csv
```

## Contributing
Expand Down
3 changes: 3 additions & 0 deletions example-params.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
algName cges netName alarm clusteringName HierarchicalClustering numberOfClusters 16 broadcasting RANDOM_BROADCASTING seed 11 databasePath ./res/datasets/alarm/alarm5.csv netPath ./res/networks/alarm/alarm.xbif
algName ges netName win95pts databasePath ./res/datasets/win95pts/win95ptsALL.csv netPath ./res/networks/win95pts/win95pts.xbif
algName cges netName alarm clusteringName HierarchicalClustering numberOfClusters 16 broadcasting RANDOM_BROADCASTING seed 19 databasePath ./res/datasets/alarm/alarm5.csv netPath ./res/networks/alarm/alarm.xbif
Loading
Loading