Skip to content

Commit

Permalink
chore: check ecobalyse-data sync for PR (#915)
Browse files Browse the repository at this point in the history
## 🔧 Problem

We have 2 repos `ecobalyse-data` and `ecobalyse` so we have out of sync
issues.

I experienced them and it's quite annoying.

We agreed upon a workflow with the following rules : 
- `ecobalyse-data` PR should be merged first
- Only then can a corresponding syncing`ecobalyse` PR can be merged

With this workflow, `ecobalyse-data/main` should always be sync with
`ecobalyse/branch_a` when doing a PR on `ecobalyse`:

## 🍰 Solution

In every `ecobalyse` PR add a check that verifies if `ecobalyse-data`
and `ecobalyse` are in sync for the generated `ecobalyse-data` files :
- "public/data/food/ingredients.json"
- "public/data/food/processes.json"
- "public/data/textile/materials.json"
- "public/data/textile/processes.json"
- "public/data/object/processes.json"


## 🚨  Points to watch/comments

- As `ecobalyse/master` might be temporarily behind
`ecobalyse-data/main`, this check should not apply in the CI on
`ecobalyse/master`. Only on PR to `ecobalyse/master`

- If someone merge a json-changing PR in `ecobalyse-data` and he doesn't
merge the corresponding PR on `ecobalyse`, then this check is going to
block all `ecobalyse` PR because they will all be out of sync. But
that's kind of the point, it forces us to always sync `ecobalyse-data`
and `ecobalyse`

- I didn't add `public/data/textile/processes_impacts.json` to the check
as it was more complicated with the encryption. Any modification to
`processes_impacts.json` should normally modify `processes.json` so it's
not that important

- Unit testing check-ecobalyse-data-sync.sh would be nice

## 🏝️ How to test


### Success
- running `./check-ecobalyse-data-sync.sh ` should succeed (if this is
sync to `ecobalyse-data`)

### Failure
  - add a difference in one of the generated json (listed above)
- running `./check-ecobalyse-data-sync.sh ` should fail and display the
diff
  • Loading branch information
paulboosz authored Feb 4, 2025
1 parent 658ee2a commit e45109e
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 0 deletions.
15 changes: 15 additions & 0 deletions .github/workflows/ecobalyse-data-sync.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: ecobalyse-data sync

on:
pull_request:
branches: [ master, staging ]
workflow_dispatch:

jobs:
check-ecobalyse-data-sync:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Check synchronization with ecobalyse-data for generated JSON
run: ./bin/check-ecobalyse-data-sync.sh
73 changes: 73 additions & 0 deletions bin/check-ecobalyse-data-sync.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/bin/bash
set -euo pipefail

# Directory where files will be downloaded
TEMP_DIR=$(mktemp -d)
ECOBALYSE_DATA_BRANCH="${ECOBALYSE_DATA_BRANCH:-main}"
ECOBALYSE_DATA_REPO="${ECOBALYSE_DATA_REPO:-MTES-MCT/ecobalyse-data}"
RAW_GITHUB_URL="https://raw.githubusercontent.com/${ECOBALYSE_DATA_REPO}/${ECOBALYSE_DATA_BRANCH}"
DIFFERENCES_FOUND=0

# Cleanup function
cleanup() {
rm -rf "$TEMP_DIR"
}
trap cleanup EXIT

# Files to check
FILES=(
"public/data/food/ingredients.json"
"public/data/food/processes.json"
"public/data/textile/materials.json"
"public/data/textile/processes.json"
"public/data/object/processes.json"
)

echo "Downloading files from ecobalyse-data repository (branch: ${ECOBALYSE_DATA_BRANCH})..."

# Create necessary directories in TEMP_DIR
for file in "${FILES[@]}"; do
mkdir -p "$TEMP_DIR/$(dirname "$file")"
done

# Download each file
for file in "${FILES[@]}"; do
curl -s "$RAW_GITHUB_URL/$file" -o "$TEMP_DIR/$file" || {
echo "--⚠️ Failed to download $file from ecobalyse-data repository"
continue
}
done

echo "Comparing JSON files between ecobalyse-data and ecobalyse repositories..."

for file in "${FILES[@]}"; do
if [ -f "$file" ]; then
other_file="$TEMP_DIR/$file"

if [ -f "$other_file" ]; then
# Compare files and store the diff output
diff_output=$(diff -u "$file" "$other_file" || true)
if [ -n "$diff_output" ]; then
echo "--❌ $file is different:"
echo "$diff_output"
DIFFERENCES_FOUND=1
else
echo "--✅ $file is synchronized"
fi
else
echo "--⚠️ $file does not exist in ecobalyse-data repository"
DIFFERENCES_FOUND=1
fi
else
echo "--⚠️ $file does not exist in ecobalyse repository"
DIFFERENCES_FOUND=1
fi
done

if [ $DIFFERENCES_FOUND -eq 1 ]; then
echo "❌ Differences found between repositories ecobalyse-data and ecobalyse"
exit 1
else
echo "✅ All JSON files are synchronized"
exit 0
fi

0 comments on commit e45109e

Please sign in to comment.