Intermediate
Alibaba Cloud / Volcengine / Zeabur / Unraid / AWS / Google Cloud / Kubernetes / Docker / Self-hosted
Estimated time: 25 min

OpenClaw Deployment Troubleshooting: Token Resets, STARTING Containers, and Startup Probe Failures

A platform-aware guide for Alibaba Cloud, Volcengine, Zeabur, Unraid, AWS, and Google Cloud: why image resets force token re-entry, how to persist OpenClaw state correctly, and how to debug STARTING / startup probe failed step-by-step.

Implementation Steps

OpenClaw is stateful. If your state directory is not persisted, resets/redeploys wipe tokens, configs, sessions, and device approvals together.

Deployment problems often look like this:

  • After an image reset / OS reinstall / container rebuild, the Control UI asks you to enter the token again.
  • Your platform is stuck in STARTING with errors like:
    • Startup probe failed: dial tcp ... connect: connection refused
  • The Control UI disconnects with:
    • disconnected (1008): pairing required

This guide gives you a repeatable playbook: fix persistence first (so you stop “losing” tokens), then debug STARTING and probe failures from the bottom up.

Further reading in this site:


1) Why tokens “disappear” after resets (the real cause)

When people say “token”, they usually mean the Gateway token (used for Control UI / API authentication).

The key idea:

  • OpenClaw is stateful. The state directory (commonly ~/.openclaw/) contains config, tokens, sessions, device approvals, channel credentials, and caches.
  • If your platform reset/redeploy deletes the disk or container write layer, you did not just lose a token - you provisioned a new machine.

This is why “reset image” frequently means “re-enter token”.


2) The golden rules (works on every platform)

You want two outcomes:

  1. Tokens are reproducible (defined by env vars / secret manager).
  2. The OpenClaw state directory is persistent (volume / attached disk / network filesystem).

In openclaw.json, prefer env substitution:

{
  gateway: {
    auth: { mode: 'token', token: '${OPENCLAW_GATEWAY_TOKEN}' },
    // Remote deployments usually need non-loopback bind
    bind: 'lan',
  },
}

Set it in the runtime environment (systemd/Docker/PaaS env settings):

export OPENCLAW_GATEWAY_TOKEN='your-long-random-token'

Why this helps:

  • Browser storage can be wiped, but your gateway token stays stable.
  • You can rotate tokens by changing the secret, without editing JSON in multiple places.

2.2 Persist the state directory (this is the real fix)

Default state dir is ~/.openclaw/ (unless overridden). Persist it using one of:

  • Docker: bind-mount a host directory or a named volume into the container.
  • VM: store state on an attached data disk (EBS / Persistent Disk) rather than the system/root disk.
  • Kubernetes: store state on a PersistentVolume (EBS/EFS/CSI, etc.).
  • PaaS: enable platform persistence and mount it where the OpenClaw home/state lives.

If you do only one thing in this guide, do this.


3) Recover or rotate your token (without wiping state)

If the service was not reset and only your browser forgot the token:

3.1 Docker: check .env

The Docker setup flow commonly writes the token into .env.

cat .env | rg OPENCLAW_GATEWAY_TOKEN

3.2 Check config and the service runtime environment

  • If config uses ${OPENCLAW_GATEWAY_TOKEN}, the actual value lives in systemd/Docker/PaaS env settings.
  • If you hardcoded it, you will see it in openclaw.json.

Useful CLI command:

openclaw config get gateway.auth

3.3 “pairing required” (1008): approve the device

The Control UI requires one-time device approval for new browser profiles/devices.

openclaw devices list
openclaw devices approve <requestId>

If you truly lost state (fresh machine), you must repeat device approval - that is expected.


4) STARTING / startup probe failed: debug in layers

STARTING is a platform health-check label, not an OpenClaw diagnosis. Use this bottom-up checklist:

  1. Is the process crashing/restarting? (logs)
  2. Is the port listening? (platform probes check TCP/HTTP reachability)
  3. Is bind correct? (binding only to 127.0.0.1 makes platform probes fail)
  4. Does config load? (config parse, missing env vars, permissions, disk full)
  5. Is the platform health check too aggressive? (startup too slow, probe kills it early)

4.1 Logs first

Docker:

docker compose logs -f openclaw-gateway

Typical restart-loop causes:

  • invalid config / bad include paths / missing env vars
  • state directory permissions (container user cannot write)
  • OOM (memory limit too low)

4.2 Verify the port is listening

On the host:

ss -ltnp | rg 18789 || true

Inside the container:

docker compose exec openclaw-gateway sh -lc 'ss -ltnp || netstat -ltnp'

If nothing is listening, probes will fail. Fix the startup error from logs.

4.3 Bind matters: remote deployments usually require lan

If you are debugging this on native Windows and what you actually wanted was a more server-like service model, do not forget that Scheduled Task is not the same as a true always-on Windows Service. In that case, also read:

A very common failure mode:

  • Gateway is running and listening, but only on 127.0.0.1.
  • Platform probes come from outside the container/VM and get ECONNREFUSED.

For remote deployments, set:

{ gateway: { bind: 'lan' } }

Important: when binding beyond loopback, OpenClaw requires auth (token/password). If you see a message like “refusing to bind … without auth”, configure auth and restart.

4.4 Use OpenClaw health/probe commands

If you have shell access on the host/container, probe locally:

openclaw gateway health --url ws://127.0.0.1:18789 --token "$OPENCLAW_GATEWAY_TOKEN"

If local health fails, the issue is not the platform probe - it is a gateway startup/config/network issue.

4.5 Platform probes too strict: give startup more time

For Kubernetes-style platforms, add a startupProbe (TCP is usually the least surprising):

startupProbe:
  tcpSocket:
    port: 18789
  periodSeconds: 5
  failureThreshold: 60 # ~5 minutes

For PaaS platforms with a health check toggle, it is often useful to temporarily disable health checks while fixing a bad config (to break the restart loop), then re-enable.


5) Platform guidance

5.1 Alibaba Cloud (Simple Application Server image)

Alibaba’s OpenClaw image FAQ highlights the key pitfall: a system reset deletes system disk data, and you must reconfigure tokens/keys afterward.

Checklist:

  • Treat ~/.openclaw/ as production data; snapshot/backup before resets.
  • Avoid exposing the Control UI publicly; prefer SSH tunnel/Tailscale.
  • If you do expose it, ensure firewall/security group rules allow only your IPs.

5.2 Volcengine

Volcengine behaves like most cloud VM/container platforms: if you rebuild without persistence, state is gone.

Recommended:

  • Put OpenClaw state on a data disk (example mount: /data/openclaw).
  • Back up using disk snapshots.
  • If using Docker, bind-mount the data disk path into the container.

5.3 Zeabur

Zeabur deployments usually fail for two reasons:

  • Persistence is not configured, so redeploy wipes state.
  • Health checks kill the container before it becomes ready.

Suggested workflow:

  • Enable persistent storage and mount it to where /home/node/.openclaw lives.
  • Fix bind/auth/port.
  • If needed, use rescue mode and temporarily disable health checks while you fix config.

5.4 Unraid

Unraid best practice is to map application state into /mnt/user/appdata/... so rebuilding the container (or recreating docker.img) does not wipe state.

Suggested mapping:

Host:      /mnt/user/appdata/openclaw
Container: /home/node/.openclaw

If state was stored in the container write layer, rebuilding/updating will cause token resets and session loss.

5.5 AWS (EC2 / ECS/Fargate / EKS)

  • Use EBS for persistence.
  • Put OpenClaw state on a dedicated EBS data volume mounted at /data/openclaw.
  • Set OPENCLAW_STATE_DIR=/data/openclaw/.openclaw and back up with EBS snapshots.
  • Be mindful that root volumes are often deleted on termination unless configured otherwise.

ECS/Fargate (stateless by default)

Fargate is great for stateless services. For OpenClaw, you typically need a persistent filesystem for state.

  • If you deploy on Fargate, mount an EFS volume and store the OpenClaw state directory there.

EKS (Kubernetes)

  • Store ~/.openclaw on a PersistentVolume (EBS/EFS CSI).
  • Add a TCP startupProbe to avoid early restarts.

5.6 Google Cloud (Compute Engine / GKE / Cloud Run)

  • Use a Persistent Disk mounted at /data/openclaw.
  • Set OPENCLAW_STATE_DIR=/data/openclaw/.openclaw.
  • Back up with disk snapshots.

Tip: use stable device naming (UUID or /dev/disk/by-id) when mounting disks.

GKE (Kubernetes)

  • Mount a PersistentVolume for ~/.openclaw.
  • Add a startupProbe (TCP) for port 18789.

Cloud Run (not ideal for OpenClaw)

Cloud Run is optimized for stateless services. Local disk is ephemeral and instances can be replaced at any time.

If you still want to experiment, Cloud Run supports Cloud Storage volume mounts, but this changes filesystem semantics (object storage presented as files) and may not be a drop-in replacement for all state patterns.

For production, prefer Compute Engine or GKE.


6) Backup + restore drill (do this once before you need it)

Minimal backup (VM / bare metal / Docker host):

tar -czf openclaw-state-backup.tgz ~/.openclaw

Verify you can:

  • restore onto a new VM/container
  • reuse the same OPENCLAW_GATEWAY_TOKEN
  • keep channels/devices working (or at least understand what must be re-approved)

References (official docs)

Verification & references

  • Reviewed by:CoClaw Editorial Team
  • Last reviewed:March 14, 2026
  • Verified on: Alibaba Cloud · Volcengine · Zeabur · Unraid · AWS · Google Cloud · Kubernetes · Docker · Self-hosted

Related Resources

OpenClaw Control UI Auth & Pairing: Fix 'unauthorized', 1008, and Remote Dashboard Access
Guide
A practical guide to Control UI failures: why 'unauthorized' happens, how tokenized dashboard links work, what 'disconnected (1008): pairing required' really means, and how to connect safely to a remote gateway.
OpenClaw Remote Dashboard Access Playbook: SSH Tunnel, Tailnet, or Reverse Proxy
Guide
A practical, step-by-step playbook for using the OpenClaw Control UI from another device (laptop/phone) without token mismatch, pairing loops (1008), or brittle public exposure. Pick one access pattern and make it stable.
OpenClaw State, Workspace, and Memory: Persistence & Permissions Troubleshooting
Guide
Fix 'memory not written', 'it said it saved files but nothing exists', and post-restart amnesia by understanding the state directory, workspace paths, container UID/GID, and how to make persistence reproducible across Docker/VPS/PaaS.
OAuth token refresh failed (Anthropic Claude subscription)
Fix
Fix expired Anthropic subscription auth by switching to a Claude Code setup-token and pasting it on the gateway host.
Gateway: service says running but the CLI/UI can't connect (probe fails)
Fix
Fix 'running but unreachable' gateway states by checking you're probing the right URL, confirming the port is listening, and aligning auth/bind settings.
Gateway crashes with EBUSY / EACCES / EPERM when `~/.openclaw` is cloud-synced
Fix
Fix gateway crashes caused by putting the live OpenClaw state directory inside iCloud Drive, OneDrive, Dropbox, Google Drive, or similar sync tools that briefly lock session/config files while uploading.

Need live assistance?

Ask in the community forum or Discord support channels.

Get Support