Troubleshooting
A component-by-component fault tree for things that go wrong. For the most common first-install issues, start with Getting Started → Common problems.
General approach
- Read the launcher's log.
/var/lib/aether-ops-bootstrap/bootstrap.logcontains everything the last run printed. It's tee'd in real time so you cantail -fit during a run. - Read the state file.
sudo aether-ops-bootstrap state. Thehistorytells you what's been tried; thecomponentsmap tells you which component last completed. - Check systemd units.
systemctl status <unit>andjournalctl -u <unit>. Most install failures end with a service failing to start. - Collect diagnostics.
sudo aether-ops-bootstrap diagnose --output /tmpbundles everything above into one tarball for sharing.
Preflight failures
These happen before any component runs. The launcher aborts cleanly — nothing was touched on the host.
unsupported Ubuntu version
Cause. /etc/os-release reports a version outside 22.04, 24.04, or 26.04
(soon to be released).
Fix. Use a supported Ubuntu release. The bundle's .deb dependency
closure is resolved per suite at build time; running on an unsupported
suite would at best install unrelated-version .debs and at worst refuse
to dpkg -i them.
must run as root
Cause. The launcher was not invoked under sudo or as root.
Fix. sudo aether-ops-bootstrap ….
systemd not present
Cause. /run/systemd/system doesn't exist — this host isn't running
systemd as PID 1.
Fix. The bootstrap requires systemd. Containers or non-systemd inits are not supported.
manifest schema_version X not supported
Cause. The bundle was built against a newer manifest schema than this launcher understands.
Fix. Use a matching launcher version. The release process ships launcher + bundle together; mismatched pairs shouldn't normally exist.
prior install exists
Cause. A state file is present at /var/lib/aether-ops-bootstrap/state.json.
Fix. Pick the right command — upgrade, repair, or install --force.
See upgrades-and-repair.
Per-component failures
debs
Symptoms. dpkg: dependency problems … or dpkg: error processing archive ….
Usual causes.
- Ubuntu suite mismatch between bundle and host (e.g. bundle built for
noble, host isjammy). - A
.debwas corrupted in transit (rare — the bundle hash check catches this, but if you extracted manually for troubleshooting, you may have bypassed it). - An existing broken dpkg state on the host (
dpkg --configure -apending).
Recovery. Fix the host's dpkg state first:
sudo dpkg --configure -a
sudo apt-get -f install # only if network is available
Then repair.
ssh
Symptoms. sshd fails to restart after the drop-in is written, or SSH password authentication still does not work for the onramp user.
Usual causes.
- A pre-existing sshd drop-in conflicts with the one being written.
- The sshd binary isn't where the launcher expects.
Recovery.
sudo sshd -t # validates current config
ls /etc/ssh/sshd_config.d/ # see all drop-ins
sudo systemctl restart ssh # restart after fixing
Then repair.
sudoers
Symptoms. visudo: parse error or the drop-in wasn't moved into place.
Usual causes.
- A broken drop-in in
/etc/sudoers.d/from a previous partial run. - The bundle's template has a syntax error (unusual — this would fail CI).
Recovery. The launcher validates with visudo -cf before moving the
file in, so a broken drop-in never lands under /etc/sudoers.d/ via the
launcher. If one is present, remove it manually:
sudo visudo -cf /etc/sudoers.d/<offending-file>
sudo rm /etc/sudoers.d/<offending-file> # only if broken
Then repair.
service_account / onramp
Symptoms. useradd: user already exists or similar.
Usual causes.
- A prior partial install left the user in place. This is usually benign —
useraddwith the right flags is idempotent-ish, and the launcher treats "user already exists, matches spec" as success. - An actual different user with the same name exists (unusual).
Recovery. If the existing user isn't the one the launcher wants (wrong
UID, wrong groups), you'll need to remove it manually (userdel) before
the launcher can create it correctly.
rke2
Symptoms. "waiting for rke2-server" times out; rke2-server reports
not ready.
Usual causes.
- Disk pressure. The airgap image tarball expands to several GB under
/var/lib/rancher/rke2/.df -h /var/lib/rancher— you need 40 GB+ free. - Firewall.
ufworiptablesblocking 6443 to localhost is rare but happens on hardened images. - Clock skew. RKE2's TLS certs are minted at install; significant clock
skew breaks them.
timedatectlto check. - Previous RKE2 install.
/var/lib/rancher/rke2/already populated by a different version/config. Run RKE2's uninstall script:thensudo /usr/local/bin/rke2-uninstall.shrepair.
Recovery.
sudo journalctl -u rke2-server --no-pager -n 200
sudo systemctl status rke2-server
sudo aether-ops-bootstrap repair --bundle bundle.tar.zst
helm
Symptoms. Rare to fail — this component is just writing one binary.
Usual causes. /usr/local/bin doesn't exist or isn't writable by root
(extremely unusual on Ubuntu).
aether_ops
Symptoms. aether-ops.service starts but health endpoint never
responds; or the service crashes on start.
Usual causes.
- Port already in use. Something else is listening on the port
aether-ops wants. In 0.1.x, check
sudo ss -tlnp | grep 8186. - Missing dependency on RKE2. If the
rke2component didn't complete, aether-ops will start but fail its health check because it can't reach the Kubernetes API. Fixrke2first.
Recovery.
sudo journalctl -u aether-ops --no-pager -n 200
sudo systemctl status aether-ops
# after resolving the issue:
sudo aether-ops-bootstrap repair --bundle bundle.tar.zst
"The launcher exited cleanly but something's wrong"
Sometimes the bootstrap reports success but downstream users report aether-ops misbehaving. Sequence:
- State file says everything's installed. Confirm:
sudo aether-ops-bootstrap state | jq '.components' - All services are active.
systemctl is-active rke2-server aether-ops ssh checkreports components as up to date.Ifsudo aether-ops-bootstrap check --bundle bundle.tar.zstcheckreports planned actions, the state and desired bundle differ.repairre-applies the selected components.- aether-ops' own logs don't show errors.
sudo journalctl -u aether-ops --since '5 minutes ago'
If all four look healthy, the issue is inside aether-ops itself, not in the bootstrap — escalate there.
When to ship a diagnostic tarball
Always, for any failure you can't resolve in a minute or two. Running:
sudo aether-ops-bootstrap diagnose --output /tmp
produces /tmp/aether-bootstrap-diag-<timestamp>.tar.gz. Ship
that one file. It contains the state, the bootstrap log, recent service
journals, and the configs the launcher installed — everything a support
engineer needs to reproduce and diagnose without a second trip to the
airgap site.
The launcher also emits a diagnostic tarball automatically on any failed run. The path is printed in the log:
diagnostic bundle saved to /tmp/aether-bootstrap-diag-20260418T143211.tar.gz
please send this file for troubleshooting support