When OAuth renewal “works” but the host still keeps falling back into recovery or safe mode, the problem is often not that the new token is bad.
The more common failure is: the gateway updated the live auth file, but recovery still reads an older provisioned copy.
This guide explains the two-file model and the minimal recovery procedure that usually gets the system healthy again.
1) The symptom pattern
A common sequence looks like this:
- You run a successful OAuth renewal or
openclaw onboardflow. - The gateway starts using the refreshed credential in the live runtime path.
- Later, safe-mode recovery or a repair script runs.
- Recovery reads an older auth snapshot and decides auth is still broken.
- The machine falls back into the same recovery loop again.
From the operator point of view, this feels like:
- “OAuth renewal succeeded, but the machine still thinks auth is broken.”
- “I fixed it once, but the next recovery cycle undid it.”
- “The gateway and the recovery script seem to disagree about which auth state is real.”
2) The two-file model
In affected setups, there are effectively two sources of truth for auth state:
~/.openclaw/agents/main/agent/auth-profiles.json- the live copy
- the gateway reads/writes this during normal operation
~/.openclaw/agents/main/agent/auth-profiles.provisioned.json- the provisioned / golden copy
- recovery may prefer this file because it is intended to survive bad runtime persistence or partial shutdowns
That design can be reasonable, but it creates an operational trap:
- successful re-auth may update the live file,
- while recovery still trusts the provisioned file,
- so the next recovery cycle reintroduces stale auth state.
If you do not keep the two in sync after renewal, recovery can keep using old credentials even though the gateway had already moved on.
3) Minimal recovery procedure after OAuth renewal
3.1 Copy the fresh live auth file into the provisioned copy
After a successful renewal, sync the two files:
cp ~/.openclaw/agents/main/agent/auth-profiles.json \
~/.openclaw/agents/main/agent/auth-profiles.provisioned.json
Why this helps: it makes the next recovery cycle read the same fresh auth state the gateway is already using.
3.2 Clear stale safe-mode markers if your environment uses them
If the host is already stuck in a recovery loop, clear the markers that keep forcing re-entry:
sudo rm -f /var/lib/init-status/safe-mode
echo 0 | sudo tee /var/lib/init-status/recovery-attempts
Why this helps: even after auth is fixed, some environments will keep re-running the recovery path until the safe-mode marker and attempt counter are cleared.
3.3 Re-run your normal recovery/restart path
Depending on your host, that might be:
- a supervisor restart,
- a recovery script,
- or a service restart that performs a health check and exits safe mode.
The exact command varies by deployment, but the key idea is: resync files first, clear stale state second, restart third.
4) How to verify the fix actually stuck
Do not stop at “the copy command succeeded.” Verify all three layers:
4.1 Verify the auth files now match
Check timestamps and, if needed, compare contents:
ls -l ~/.openclaw/agents/main/agent/auth-profiles*.json
4.2 Verify model auth from OpenClaw itself
openclaw models status --probe
You want to confirm the provider is usable, not just that files exist.
4.3 Verify recovery stops re-entering safe mode
After the next restart or repair cycle:
- the host should stay out of safe mode,
- the same broken auth diagnosis should not immediately return,
- and model calls should keep succeeding.
5) When this guide applies vs when it does not
This guide is a good fit when:
- renewal/auth setup appears successful,
- but later recovery still behaves like auth is stale,
- especially on hosts with custom recovery scripts or hardened safe-mode flows.
This guide is not the main fix when:
- the provider token itself is actually expired or invalid,
- you never successfully renewed auth in the first place,
- or your config points at the wrong provider/profile entirely.
In those cases, start with the shorter auth troubleshooting pages first.
5.5) If auth-profiles.json is corrupted (concurrent write / “Extra data”)
In some high-concurrency setups (busy channels + cron jobs + periodic auth refresh), operators have reported auth-profiles.json becoming unparseable due to what looks like concurrent writes (two JSON objects concatenated, or a truncated second object).
Symptom pattern:
- OpenClaw suddenly reports “No API key found” for a provider you definitely configured.
- Errors mention the auth store path, for example:
Auth store: ~/.openclaw/agents/main/agent/auth-profiles.json
- The file fails to parse with errors like:
JSONDecodeError: Extra data
Emergency recovery (safe operator path):
- Stop the gateway (so nothing writes while you repair):
openclaw gateway stop
- Back up the corrupted file:
cp ~/.openclaw/agents/main/agent/auth-profiles.json \
~/.openclaw/agents/main/agent/auth-profiles.json.corrupt.$(date +%Y%m%d-%H%M%S)
- Restore from a known-good source:
- If you have
auth-profiles.provisioned.jsonin your environment, copy it back:
cp ~/.openclaw/agents/main/agent/auth-profiles.provisioned.json \
~/.openclaw/agents/main/agent/auth-profiles.json
- Otherwise, restore from your state backups (recommended:
openclaw backup create --verifybefore upgrades/changes).
- Restart and probe:
openclaw gateway restart
openclaw models status --probe
Prevention tips (until upstream atomic writes land):
- Avoid running multiple gateways against the same state directory.
- Keep the state directory off cloud-sync folders (see: /troubleshooting/solutions/state-dir-cloud-sync-ebusy-crash).
- Keep at least one periodic backup of the auth file (or the whole state dir) so restoration is boring.
6) Operational habit to adopt
If your deployment uses a provisioned/golden auth copy, treat OAuth renewal as a two-step operation:
- renew the live auth,
- resync the provisioned copy.
That small habit prevents a lot of “it was fixed until the next recovery cycle” incidents.