Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry Pi Zero 2 W Support? #2108

Open
mcrosson opened this issue Nov 3, 2024 · 10 comments · May be fixed by #2111
Open

Raspberry Pi Zero 2 W Support? #2108

mcrosson opened this issue Nov 3, 2024 · 10 comments · May be fixed by #2111
Labels

Comments

@mcrosson
Copy link

mcrosson commented Nov 3, 2024

Overview of the Issue

Are there any plans to officially support the Raspberry Pi Zero 2 W?

Reproduction Steps / Why I Ask

I ask about official support because I've been experimenting with running Anthias on the Pi Zero 2 W and I keep running into horrific disk io contention issues. An overview of my progress is below.

I was able to sort the installer problems I found and was able to get the install to run to completion within 45-60 minutes on average. However, during container start up the disk io contention makes the Pi Zero 2 W unusable. I noticed something similar on my pi4 deployment: whenever the containers start, they use up all the disk io, cause the load average to spike over 15-20 and the system becomes unusable. On the pi4 the heavy disk io eventually settles and the load average drops within a 10 minutes usually but these two things can be very problematic during startup on the pi4.

On the Pi Zero 2... I've yet to find a way to constrain the intense disk io the containers cause during startup to allow Anthias to actually run. I'm seeing load averages as high as 60 when starting the containers without tuning and with some significant tuning I'm seeing load averages over 30. In either scenario the system becomes unusable very quickly. Additionally, the system will eventually throw a kernel error saying a task is hung. Once this happens, the Pi Zero 2 W is effectively crashed and unusable. The server and viewer containers seem to cause the biggest problems for disk io contention in my experiments.

From what I can figure out, the containers would need to be started sequentially and not in parallel and/or tuned to be less aggressive on disk io during startup. Without the disk io contention addressed, I don't see how the Pi Zero 2 W could be supported.

I can provide my notes if desired but they are dense and very long with a pair of patches that make some significant changes to the setup process and docker-compose template. Please let me know if you'd like this additional detail.

Environment

  • Raspberry Pi Hardware Version: Raspberry Pi Zero 2 W
  • Raspberry Pi Network Setup: Built in WiFi
  • Anthias Version: GitHub sources, revision 8acb12d
@nicomiguelino
Copy link
Contributor

@mcrosson, thank you for creating an issue. Yes, I'd be happy to see your notes. I don't mind if they're dense. I really appreciate it.

@nicomiguelino nicomiguelino moved this to Investigating in Anthias Nov 4, 2024
@mcrosson
Copy link
Author

mcrosson commented Nov 4, 2024

reply to @nicomiguelino

the below is everything ive found related to the raspberry pi zero 2 w. a lot of the changes will likely need adjusting if incorporated into the main anthias project. you're welcome to borrow/use/etc anything from the below.

i hope the below has enough clarity to be useful but if not, i can provide any additional detail as desired.

overview of issue

the anthias installer hangs and sometimes crashes when run on a raspberry pi zero 2 w -- its essentially 'impossible' to install anthias on a pi zero 2 w without changes to the anthias install procedure.

i managed to find fixes that allow anthias to be installed on a pi zero 2 w. these fixes are fully documented below.

i also found additional items that should be addressed for better pi zero 2 w support. these items are also documented below. some of these may also be applicable to other raspberry pi setups.

i don't know how the maintainers may want to proceed and have placed my notes here for reference. hopefully they will allow for another person to put together patch(es) based on my notes that meet the requirements for inclusion in the project.

environment / hardware

  • Hardware Version: Raspberry Pi Zero 2 W
  • Networking: Built in WiFi on 2.4ghz with dhcp
  • Code: Latest github sources
  • OS: Raspberry Pi OS Lite (64bit)
  • Storage: multiple different sd cards, multiple usb-otg disks

related github issues (probably)

critical / must read before continuing

the below will allow anthias to be installed on a pi zero 2 w with an sd card.

HOWEVER: IT WILL NOT RUN PROPERLY POST INSTALL

post install, anthias will cause load average spikes over 40 and will generally Not Work at all.

anthias requires a storage setup thats fast enough to keep pace with the containers. the containers perform heavy disk io and will make the setup wholly unusable on the pi zero 2 w. i have tested multiple sd cards and usb-otg disks, all fail to run the anthias services post-install.

i cannot underscore this point enough: anthias will not run from an sd card or usb-otg properly on the pi zero 2 w based on my testing.

fundamental fixes for installer

swap

swap is required for the install to work correctly. the current installer removes dphys-swapfile and disables swap mid-way through the install process. disabling swap via these steps causes problems with disk io to the point the device becomes unusable and the install will eventually fail or lock up so badly (due to high load average) the installer may as well have failed to run.

keeping dphys-swapfile.service and swap enabled for the duration of the install avoids disk io stalls, keeps ssh over wifi interactive and greatly reduces time to install anthias. this also solves all pi zero 2 w install issues ive seen reported on github related to 'hangs' or related to apt failing (particularly on libc6-dev and docker install).

ive included a work around / fix below.

docker pull

docker tries to download and extract images in parallel which causes high load averages and lock ups. setting the max-concurrent-downloads option to 1 will generally avoid lock ups related to docker pull. this is required for the install to complete.

ive included a work around / fix below.

additional concerns (and fixes) beyond the installer

these items are mainly related to running anthias. the below will make anthias seem to start but in reality it will ultimately fail to run due to heavy disk io which causes load average spikes. the below changes mitigate this to an extend but to not fully solve the problem.

note: ive left the system running overnight and it still had failed to start the anthias containers. the below mitigates problems running anthias but does not fully solve them.

  • the cgroup config that is needed to allow docker to limit container memory use is not setup by default
    • the cgroup memory config needs to be set to prevent the containers from eating all the boards ram
    • ive included a work around / fix for this below
  • during the cleanup phase of the install script...
    • the system becomes unusable during this phase of installation due to the containers being started prior to the install completing
    • ive included a work around / fix below
  • cpu spikes caused by containers can be an issue
    • cpu limits can be added to each container to address this
    • work around / fix included in the below
  • the container memory limits likely need to be adjusted down
    • limited system memory means the pi zero 2 w has less 'wiggle room' for background services / maintenance tasks
    • high swap use when swap is enabled
    • ive included a work around / tuning included in the below
      • this likely needs additional tuning and adjustments
  • disk iops / bps for each container should probably be constrained
    • containers are very heavy on disk iops when starting
    • quickly cause >40 load averages
    • the system is unusable when this happens
    • ive included a partial work around / fix below
      • the changes need additional tuning/adjustments as they do not solve the disk io problem, they merely delay it from happening for a few minutes
  • the installed version of docker is armhf and should probably be arm64 on a 64bit raspberry pi os deployed to pi zero 2 w boards
    • ive included a work around / fix for this below
  • the host-agent service can kick in during setup which is problematic for disk io, cpu and ram -- if it fires at the 'wrong time'... the system becomes unusable during install
    • ive included a change below to not start this service during install and a prompt to have users enable it post-install
    • this is required to get the installer to reliably run to completion
    • this is 100% related to the containers causing heavy disk io during startup and rendering the system unusable
  • the wifi-connect service/container is heavy on disk io, ram and cpu
    • this has been disabled and removed in the below to allow the system to remain usable for testing and development
    • this container likely needs to be excluded or heavily modified to work on the pi zero 2 w to avoid high disk io and cpu/ram contention issues
    • this container would likely benefit from having ram, cpu and disk io limits set
    • if this container starts while the install is running, the system will lock up due to high load average and the install effectively fails
  • the upgrade_containers.sh script will fall back to pi1
    • the /proc/device-tree/model value is: Raspberry Pi Zero 2 W Rev 1.0
    • this will likely need to be addressed long-term
    • it looks like the pi zero 2 w should use the pi3 docker images based on the raspberry pi official processor documentation and image tags for anthias-server
    • this also affects the installer as the install script calls the upgrade container script directly
    • ive included a work around below
  • the upgrade_containers script looks at the current amount of running ram to calculate some limits prior to any gpu memory use changes becoming active
    • this means the pi zero 2 w calculated ram limits will have an extra 128Mb used in the calculations if the system hasnt been rebooted after the gpu ram use tweaks are applied
    • 128Mb ram is 25% the ram of the pi zero 2 w and is a lot of 'slop' on the calulation
    • this likely needs to take into account the fact systems may 'lose ram' due to gpu ram adjustments made during install when calculating memory limits
    • i have not included anything to address this item below
  • the main docker-compose.yml.tmpl and generated docker-compose.yml have build information for each container
    • if this causes the containers to build on start... itll cause even more disk io contention and greatly slow down container start and exacerbate existing disk io issues caused by the start of the containers
    • i have not looked into this closely or addressed any possible problems related to this item in the below
  • the run_upgrade.sh calls the standard install script directly from the web which will not include any fixes documented here
    • it is best to git pull && ./bin/install.sh to perform upgrades so any pi zero 2 w fixes / work arounds / tweaks can be applied / kept prior to update
    • the update script just calls the standard install script directly from the web so re-running install.sh to upgrade should be safe
  • leaving the dphys-swapfile service installed but disabled when not running install/upgrade/update containers seems to be a good work around / fix for the pi zero 2 w needing swap in certain circumstances
    • sudo systemctl disable --now dphys-swapfile
      • only do this as part of a 'fresh install'. systems with anthias installed should use the following 2 items instead
      • do this after install and before starting containers
      • disables service
      • removes swap from system
    • sudo systemctl start dphys-swapfile
      • do this before any updates / upgrades
      • adds swap to system
    • sudo systemstl stop dphys-swapfile
      • do this after any updates / upgrades and before starting containers
      • removes swap from system

steps to reproduce / install procedure used

important considerations

  • it is wise to login to the local console and watch the install process via tmux even if you use ssh for the main command entry / console / command line
    • some operations can cause wifi to hiccup or fail
    • the processes continue inside tmux and youll be able to properly track progress via the local terminal as needed
  • the docker pull step can cause wifi to fail even with the config changes documented below
    • if you stay logged in at a console you can see this process continues to completion despite wifi failing for ssh
    • this process took over 10 minutes to complete during testing
    • if the wifi driver fails with a kernel error and networking stops working...
      • youll need/want to start the below procedure from the start (including a re-flash of the base os image)
      • the wifi chip may have gotten too warm which causes it to become disabled
      • there are system updates as of 2024-11-03 that can help with wifi stability, its best to apply them and reboot prior to running the install. this is included in the below procedure
  • wifi hiccups / stalls are very common -- do not panic, the below procedure uses tmux to ensure the process will continue even if wifi becomes non-usable for ssh
    • hiccups and stalls are most common during heavy io and long running download operations
    • particularly the docker pull step
  • ive disabled and removed the wifi-connect service as it causes major disk io problems and ram/cpu contention. its just too 'heavy' for the pi zero 2 w and causes no end of problems with disk io and load average during my tests
    • ensure you have wifi setup and working before running the anthias install
    • do not add this back to the system. it will exacerbate existing issues with the main anthias containers / services
  • apply both patches below to ensure core changes needed to install anthias are applied and to ensure you get the proper arch of docker installed
    • skipping the docker patch can cause additional 'slowness' and similar with the containers

procedure

the below procedure will allow anthias to be installed on a raspberry pi zero 2 w board. this process includes installer adjustments and general tweaks.

  • flash raspberry pi os lite (64bit) to sd card via Raspberri Pi Imager tool
    • add user / network / other settings when prompted
  • boot
  • let finish initial setup / first boot 'stuff'
  • once boot is stable / system ready to be used...
  • ssh / log in to console
  • install procedure (do not just copy/paste these commands, read the comments too!)
     use `tmux attach -t working` to connect to the below tmux sessions
         from other consoles / command lines
     ensure network hiccups wont be a problem
    sudo apt update && sudo apt install -y tmux
    tmux new -s working
     system updrade
    sudo apt upgrade -y
     fix cgroups so docker can limit container memory use
    sudo sh -c "echo \" cgroup_enable=memory swapaccount=1 cgroup_memory=1 cgroup_enable=cpuset\" >> /boot/firmware/cmdline.txt"
     reboot to activate cgroup changes
    sudo systemctl reboot
     ensure network hiccups wont kill install process
    tmux new -s working
     install packages needed for anthias install
    sudo apt install -y nano htop curl git
     work around / fix for docker parallel pulls
    sudo mkdir /etc/docker
    sudo sh -c "echo \"{ \\\"max-concurrent-uploads\\\": 1, \\\"max-concurrent-downloads\\\": 1 }\" \
        | cat > /etc/docker/daemon.json"
     manually install docker to get arm64 docker instead of armhf docker
    curl -fsSL https://get.docker.com -o ~/get-docker.sh
    sudo sh ~/get-docker.sh --dry-run  make sure this says arm64 and 'looks right'
    sudo sh ~/get-docker.sh  only if above step looks reasonable
     checkout anthias source, prep install
    git clone https://github.com/Screenly/Anthias.git ~/screenly
    cd ~/screenly
     apply install patch from below section
    nano -w ~/anthias_pi_zero_2_w-install.patch
         paste patch & save & exit
    git apply ~/anthias_pi_zero_2_w-install.patch
     apply docker patch from below section
         *must* apply install patch *first* or this patch wont apply properly
    nano -w ~/anthias_pi_zero_2_w-docker.patch
         paste patch & save & exit
    git apply ~/anthias_pi_zero_2_w-docker.patch
     install anthias
    time ./bin/install.sh  do NOT reboot at end of script run
      ## Manage Network:    No
      ## Branch/Tag:        master
      ## System Upgrade:    No
      ## Docker Tag Prefix: latest
      ## real    43m49.511s
      ## user    0m39.782s
      ## sys     0m10.608s
     disable swap for day to day use
    sudo systemctl disable dphys-swapfile
     enable anthias host agent
    sudo systemctl enable anthias-host-agent
    sudo systemctl reboot  because post install reboot needs to happen

patches

install patch (for usb disk install)

filename: ~/anthias_pi_zero_2_w-install.patch

diff --git a/ansible/roles/network/tasks/main.yml b/ansible/roles/network/tasks/main.yml
index 4b50c71d..42e438cd 100644
--- a/ansible/roles/network/tasks/main.yml
+++ b/ansible/roles/network/tasks/main.yml
@@ -20,8 +20,8 @@
 - name: Enable network systemd services
   ansible.builtin.systemd:
     name: "{{ item }}"
-    state: started
-    enabled: true
+    state: stopped
+    enabled: false
   with_items: "{{ network_systemd_units }}"

 - name: Disable network manager
diff --git a/ansible/roles/screenly/tasks/main.yml b/ansible/roles/screenly/tasks/main.yml
index 8c87dc7e..396b7967 100644
--- a/ansible/roles/screenly/tasks/main.yml
+++ b/ansible/roles/screenly/tasks/main.yml
@@ -101,7 +101,7 @@
 - name: Enable screenly systemd services
   ansible.builtin.systemd:
     name: "{{ item }}"
-    state: started
+    state: stopped
     enabled: true
   with_items: "{{ screenly_systemd_units }}"

diff --git a/ansible/roles/system/tasks/main.yml b/ansible/roles/system/tasks/main.yml
index 7434f162..b229b4a4 100644
--- a/ansible/roles/system/tasks/main.yml
+++ b/ansible/roles/system/tasks/main.yml
@@ -231,7 +231,6 @@
 - name: Remove deprecated apt dependencies
   ansible.builtin.apt:
     name:
-      - dphys-swapfile
       - lightdm
       - lightdm-gtk-greeter
       - matchbox
@@ -377,11 +376,3 @@
     mode: "0644"
     owner: root
     group: root
-
-- name: Disable swap
-  ansible.builtin.command: /sbin/swapoff --all removes=/var/swap
-
-- name: Remove swapfile from disk
-  ansible.builtin.file:
-    path: /var/swap
-    state: absent
diff --git a/bin/install.sh b/bin/install.sh
index 2e860cc5..7b470871 100755
--- a/bin/install.sh
+++ b/bin/install.sh
@@ -184,9 +184,6 @@ function install_ansible() {
 function run_ansible_playbook() {
     display_section "Run the Anthias Ansible Playbook"

-    sudo -u ${USER} ${SUDO_ARGS[@]} ansible localhost \
-        -m git \
-        -a "repo=$REPOSITORY dest=${ANTHIAS_REPO_DIR} version=${BRANCH} force=yes"
     cd ${ANTHIAS_REPO_DIR}/ansible

     if [ "$ARCHITECTURE" == "x86_64" ]; then
@@ -200,10 +197,6 @@ function run_ansible_playbook() {
 function upgrade_docker_containers() {
     display_section "Initialize/Upgrade Docker Containers"

-    wget -q \
-        "$GITHUB_RAW_URL/master/bin/upgrade_containers.sh" \
-        -O "$UPGRADE_SCRIPT_PATH"
-
     sudo -u ${USER} \
         DOCKER_TAG="${DOCKER_TAG}" \
         "${UPGRADE_SCRIPT_PATH}"
@@ -212,6 +205,8 @@ function upgrade_docker_containers() {
 function cleanup() {
     display_section "Clean Up Unused Packages and Files"

+    sudo -E docker compose -f /home/${USER}/screenly/docker-compose.yml down
+
     sudo apt-get autoclean
     sudo apt-get clean
     sudo docker system prune -f
diff --git a/bin/upgrade_containers.sh b/bin/upgrade_containers.sh
index 94103502..38309fbb 100755
--- a/bin/upgrade_containers.sh
+++ b/bin/upgrade_containers.sh
@@ -6,8 +6,8 @@
  Export various environment variables
 export MY_IP=$(ip -4 route get 8.8.8.8 | awk {'print $7'} | tr -d '\n')
 TOTAL_MEMORY_KB=$(grep MemTotal /proc/meminfo | awk {'print $2'})
-export VIEWER_MEMORY_LIMIT_KB=$(echo "$TOTAL_MEMORY_KB" \* 0.8 | bc)
-export SHM_SIZE_KB="$(echo "$TOTAL_MEMORY_KB" \* 0.3 | bc | cut -d'.' -f1)"
+export VIEWER_MEMORY_LIMIT_KB=$(echo "$TOTAL_MEMORY_KB" \* 0.6 | bc)
+export SHM_SIZE_KB="$(echo "$TOTAL_MEMORY_KB" \* 0.25 | bc | cut -d'.' -f1)"

 export GIT_BRANCH=$(git rev-parse --abbrev-ref HEAD)

@@ -26,7 +26,7 @@ elif grep -qF "Raspberry Pi 2" /proc/device-tree/model; then
     export DEVICE_TYPE="pi2"
 else
      If all else fail, assume pi1
-    export DEVICE_TYPE="pi1"
+    export DEVICE_TYPE="pi3"
 fi

 if [[ -n $(docker ps | grep srly-ose) ]]; then
diff --git a/docker-compose.yml.tmpl b/docker-compose.yml.tmpl
index 6bfdd7e6..df0780ee 100644
--- a/docker-compose.yml.tmpl
+++ b/docker-compose.yml.tmpl
@@ -1,30 +1,21 @@
  vim: ft=yaml.docker-compose

 services:
-  anthias-wifi-connect:
-    image: screenly/anthias-wifi-connect:${DOCKER_TAG}-${DEVICE_TYPE}
-    build:
-      context: .
-      dockerfile: docker/Dockerfile.wifi-connect
-    depends_on:
-      - anthias-viewer
-    environment:
-      PORTAL_LISTENING_PORT: 8000
-      CHECK_CONN_FREQ: 10
-      PORTAL_SSID: 'Anthias WiFi Connect'
-      DBUS_SYSTEM_BUS_ADDRESS: 'unix:path=/run/dbus/system_bus_socket'
-    network_mode: host
-    privileged: true
-    volumes:
-      - type: bind
-        source: /run/dbus/system_bus_socket
-        target: /run/dbus/system_bus_socket
-
   anthias-server:
     image: screenly/anthias-server:${DOCKER_TAG}-${DEVICE_TYPE}
     build:
       context: .
       dockerfile: docker/Dockerfile.server
+    cpus: "0.25"
+    cpuset: "1"
+    blkio_config:
+       weight: 300
+       device_read_bps:
+         - path: /dev/mmcblk0
+           rate: '5mb'
+       device_write_bps:
+         - path: /dev/mmcblk0
+           rate: '1024k'
     environment:
       - MY_IP=${MY_IP}
       - HOST_USER=${USER}
@@ -50,6 +41,16 @@ services:
     build:
       context: .
       dockerfile: docker/Dockerfile.viewer
+    cpus: "0.5"
+    cpuset: "1"
+    blkio_config:
+       weight: 300
+       device_read_bps:
+         - path: /dev/mmcblk0
+           rate: '5mb'
+       device_write_bps:
+         - path: /dev/mmcblk0
+           rate: '1024k'
     mem_limit: ${VIEWER_MEMORY_LIMIT_KB}k
     depends_on:
       - anthias-server
@@ -77,6 +78,16 @@ services:
     build:
       context: .
       dockerfile: docker/Dockerfile.websocket
+    cpus: "0.25"
+    cpuset: "1"
+    blkio_config:
+       weight: 300
+       device_read_bps:
+         - path: /dev/mmcblk0
+           rate: '5mb'
+       device_write_bps:
+         - path: /dev/mmcblk0
+           rate: '1024k'
     depends_on:
       - anthias-server
     environment:
@@ -95,6 +106,16 @@ services:
     build:
       context: .
       dockerfile: docker/Dockerfile.celery
+    cpus: "0.25"
+    cpuset: "1"
+    blkio_config:
+       weight: 300
+       device_read_bps:
+         - path: /dev/mmcblk0
+           rate: '5mb'
+       device_write_bps:
+         - path: /dev/mmcblk0
+           rate: '1024k'
     depends_on:
       - anthias-server
       - redis
@@ -119,6 +140,16 @@ services:
     build:
       context: .
       dockerfile: docker/Dockerfile.redis
+    cpus: "0.25"
+    cpuset: "1"
+    blkio_config:
+       weight: 300
+       device_read_bps:
+         - path: /dev/mmcblk0
+           rate: '5mb'
+       device_write_bps:
+         - path: /dev/mmcblk0
+           rate: '1024k'
     ports:
       - 127.0.0.1:6379:6379
     restart: always
@@ -130,6 +161,16 @@ services:
     build:
       context: .
       dockerfile: docker/Dockerfile.nginx
+    cpus: "0.25"
+    cpuset: "1"
+    blkio_config:
+       weight: 300
+       device_read_bps:
+         - path: /dev/mmcblk0
+           rate: '5mb'
+       device_write_bps:
+         - path: /dev/mmcblk0
+           rate: '1024k'
     ports:
       - 80:80
     environment:

docker patch

filename: ~/anthias_pi_zero_2_w-docker.patch

diff --git a/ansible/roles/system/tasks/main.yml b/ansible/roles/system/tasks/main.yml
index b229b4a4..fa5a1b52 100644
--- a/ansible/roles/system/tasks/main.yml
+++ b/ansible/roles/system/tasks/main.yml
@@ -245,60 +245,6 @@
       - xserver-xorg
     state: absent

-- name: Make sure distro package of Docker is absent
-  ansible.builtin.apt:
-    name:
-      - docker
-      - docker-engine
-      - docker.io
-      - containerd
-      - runc
-      - docker-compose
-    state: absent
-
-- name: Add Docker apt key (x86)
-  ansible.builtin.apt_key:
-    url: https://download.docker.com/linux/debian/gpg
-    state: present
-  when: ansible_architecture == "x86_64"
-
-- name: Add Docker apt key (Raspberry Pi)
-  ansible.builtin.apt_key:
-    url: https://download.docker.com/linux/raspbian/gpg
-    state: present
-  when: |
-    ansible_architecture == "aarch64" or
-    ansible_architecture == "armv7l" or
-    ansible_architecture == "armv6l"
-
-- name: Get Debian name
-  ansible.builtin.command: lsb_release -cs
-  register: debian_name
-  changed_when: false
-
-- name: Set architecture
-  ansible.builtin.set_fact:
-    architecture: "{{ 'amd64' if ansible_architecture == 'x86_64' else 'armhf' }}"
-
-- name: Add Docker repo
-  ansible.builtin.lineinfile:
-    path: /etc/apt/sources.list.d/docker.list
-    create: true
-    line: "deb [arch={{ architecture }}] https://download.docker.com/linux/debian {{ debian_name.stdout }} stable"
-    state: present
-    owner: root
-    group: root
-    mode: "0644"
-
-- name: Install Docker
-  ansible.builtin.apt:
-    name:
-      - docker-ce:{{ architecture }}
-      - docker-ce-cli:{{ architecture }}
-      - docker-compose-plugin:{{ architecture }}
-    update_cache: true
-    install_recommends: false
-
 - name: Add user to Docker group (all platforms)
   ansible.builtin.user:
     name: "{{ lookup('env', 'USER') }}"

additional noteworthy items / discoveries

  • with dphys-swapfile and swap active the device remains interactive over wifi
    • heavy disk io operations make working in other tmux windows problematic
      • apt operations
      • docker container pull operations
    • best to stick to one tmux window & no other work being performed while the install is active
    • pane for install and pane for htop seems to be ok
  • apt operations heavily use the swap file
  • docker install step eats the entire swap file
  • the disk cache/buffer in memory is large for duration of run -- the swap file keeps io write pressure low by swapping background services and using ram for disk io buffering from what i can tell in my testing
  • heavy disk iops cause load average spikes due to storage being slow on write
    • if load average gets to >=8, terminal output over wifi will almost always stop and/or hang
  • wifi stalls at times and can be very problematic for interactive terminals
    • use tmux or screen to be safe
    • these stalls can make it seem the install has crashed when it has not actually crashed
  • cpu can get very warm, doesnt seem to affect install process speed
  • watched cpu speed via htop during troubleshooting and fix development
  • watched io utilization with htop during troubleshooting and fix development
  • hangs reliably on libc6-dev remove task with swap disabled
    • this causes >= 8 load averages
  • hung reliably on docker install steps with swap disabled
    • this causes >= 8 load averages
  • hung reliably on docker pull with swap disabled
    • this causes >= 8 load averages
  • hung at other points with swap disabled, every time is >= 8 load average
  • the ansible apt steps, docker install and docker pull seem to be the biggest cause of problems with swap disabled

@nicomiguelino
Copy link
Contributor

@mcrosson, thank you so much! I'll spend a fair bit of time looking into those details. I'll let you know if I created a pull request based from the patches that you've provided.

@nicomiguelino
Copy link
Contributor

@mcrosson take note that as an option, you can also create a pull request if you'd prefer. (We appreciate open-source contributions from users.) I'll just drop comments and suggest changes if needed.

@mcrosson
Copy link
Author

mcrosson commented Nov 4, 2024

@mcrosson take note that as an option, you can also create a pull request if you'd prefer. (We appreciate open-source contributions from users.) I'll just drop comments and suggest changes if needed.

I'll definitely keep this in mind in the future. I think some of the above would make sense 'in general' but until I get hands on additional test hardware, I think it's wise to not post any of the above as PR's myself.

@mcrosson
Copy link
Author

mcrosson commented Nov 4, 2024

@mcrosson, thank you so much! I'll spend a fair bit of time looking into those details. I'll let you know if I created a pull request based from the patches that you've provided.

Sounds good. I'll have a Pi Zero 2 W on-hand for testing/info gathering. Just let me know if you need anything more from me.

I'll also keep an eye on this ticket so I don't miss any additional comments or feedback.

@nicomiguelino
Copy link
Contributor

@mcrosson, will take note of that. For now, I'll test the patches based off the notes, and then create PR. Thanks!

@nicomiguelino nicomiguelino moved this from Investigating to In progress in Anthias Nov 4, 2024
@nicomiguelino nicomiguelino linked a pull request Nov 4, 2024 that will close this issue
@nicomiguelino
Copy link
Contributor

@mcrosson, I created a PR from your patches. Let me know if I missed anything from the patches. Some of them were rejected, so I have to git apply some of the changes manually. I will still have to test them.

@mcrosson
Copy link
Author

mcrosson commented Nov 4, 2024

the only thing i see missing is dphys-swapfile being removed from ansible/roles/system/tasks/main.yml -- dphys-swapfile needs to be present on the system and running during the anthias install process or the install will fail.

i believe its part of the uninstall step where some raspberry pi specific services are removed / cleaned up

@mcrosson
Copy link
Author

mcrosson commented Nov 4, 2024

I just realized you could probably set the restart policy on all the containers to never and keep anthias-host-agent.service disabled during testing/development.

Those two adjustments should prevent the containers from auto-starting on boot so if the board crashes during testing you can simply remove power to turn off the board and then cold boot to resume where you left off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

2 participants