Skip to content

Troubleshooting

Common issues and their solutions when running WebMACS in production.


Connection Issues

WebSocket Disconnects

Symptom: Frontend shows "Connection lost" repeatedly, real-time data stops.

Causes & Fixes:

Cause Fix
nginx proxy timeout Set proxy_read_timeout 86400s; in nginx config
Network instability WebMACS auto-reconnects with exponential backoff — check network
Token expired Re-login to refresh JWT (24h TTL)
Rate limiting WS paths (/ws/*) are exempt by default — verify middleware config

Diagnostic:

# Check WebSocket connectivity
wscat -c ws://localhost:8000/ws/controller/telemetry?token=<jwt>

# Check backend logs
docker compose logs backend --tail=100 | grep -i "websocket\|disconnect"

Controller Cannot Reach Backend

Symptom: Controller logs connection refused or timeout errors, no sensor data arrives.

Causes & Fixes:

Cause Fix
Wrong BACKEND_URL Verify BACKEND_URL in controller .env (default: http://backend:8000)
Backend not running docker compose ps — check backend health
Docker network issue Ensure both services are on the same Docker network
Auth failure Check controller logs for 401 — re-register controller user

Diagnostic:

# From controller container, test connectivity
docker compose exec controller curl -s http://backend:8000/health

# Check controller logs
docker compose logs controller --tail=100

Database Issues

Connection Pool Exhaustion

Symptom: TimeoutError or QueuePool limit of X overflow Y reached in backend logs.

Causes & Fixes:

Cause Fix
Too many concurrent requests Increase pool_size in database config
Slow queries holding connections Check for missing indexes, optimize queries
Connection leak Ensure all sessions are properly closed (use async with)

Diagnostic:

# Check active connections
just db-shell
SELECT count(*) FROM pg_stat_activity WHERE datname = 'webmacs';

# Check for long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC
LIMIT 10;

Migration Failures

Symptom: alembic upgrade head fails on startup.

# Check current revision
docker compose exec backend alembic current

# View pending migrations
docker compose exec backend alembic history --verbose

# Force stamp to a known good state (last resort)
docker compose exec backend alembic stamp head

See Database Migrations for detailed guide.


Plugin Issues

Plugin Fails to Load

Symptom: Plugin shows as installed but no channels appear, controller logs show errors.

Common Causes:

# Check controller logs for plugin errors
docker compose logs controller --tail=100 | grep -i "plugin\|error\|traceback"
Cause Fix
Missing entry point Plugin .whl must declare webmacs.plugins entry point
Dependency conflict Install with --no-deps and add dependencies separately
Wrong Python version Plugin must be compatible with Python 3.13
Architecture mismatch ARM plugin on x86 host (or vice versa)

Plugin Channels Not Syncing

Symptom: Plugin is loaded, channels exist, but no events are created in the backend.

# Trigger manual sync
docker compose restart controller

# Check channel_mappings table
just db-shell
SELECT * FROM channel_mappings;

OTA Update Issues

Upload Fails (413 Entity Too Large)

Symptom: Bundle upload hangs or returns 413.

Fix: Increase nginx upload limit:

# nginx.conf
client_max_body_size 500M;

Then restart nginx:

docker compose restart nginx

Update Stuck in "applying"

Symptom: Status stays at applying for more than 15 minutes.

# Check what's happening
docker compose logs --tail=200

# Force restart if safe
sudo systemctl restart webmacs

# The bundle moves to updates/failed/ on error
ls /opt/webmacs/updates/failed/

Health Check Fails After Update

Symptom: All containers running but /health returns errors.

# Check each service
docker compose ps
docker compose logs backend --tail=50
docker compose logs controller --tail=50

# Restore database from pre-update backup
cat /opt/webmacs/updates/backups/webmacs_backup_*.sql | \
  docker compose exec -T db psql -U webmacs webmacs

# Restart
sudo systemctl restart webmacs

Performance Issues

Slow Datapoint Queries

Symptom: Dashboard loads slowly, CSV export times out.

Causes:

Cause Fix
Missing index Verify ix_datapoints_event_ts exists: \di in psql
Too many datapoints Use time-bounded queries, consider archival
No experiment filter Always filter by experiment for bounded result sets

Diagnostic:

just db-shell

-- Check table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(oid))
FROM pg_class
WHERE relname LIKE '%datapoint%';

-- Check index usage
SELECT indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE relname = 'datapoints';

High Memory Usage

Symptom: Backend container uses excessive RAM.

Causes:

Cause Fix
Rate limiter state Normal for high-traffic deployments; entries pruned every 60s
Large query results Add pagination (?skip=0&limit=100) to API calls
Plugin memory leak Check plugin code, restart controller periodically

Docker Issues

502 Bad Gateway on Login

If you see 502 Bad Gateway when trying to log in, the backend is not ready yet.

# Check if backend is still starting
cd /opt/webmacs
sudo docker compose -f docker-compose.prod.yml --env-file .env ps

# View startup logs
sudo docker compose -f docker-compose.prod.yml --env-file .env logs backend --tail 30

On a Raspberry Pi, the backend can take 1–3 minutes to start on the first boot (database table creation, migrations, admin seeding). Wait and retry.

422 Unprocessable Content on Login

If you see 422 Unprocessable Content when logging in:

  • Make sure you’re using the correct admin email (default: admin@webmacs.local)
  • Check that the email field is not empty
  • Check your WebMACS version — older versions had strict email format validation that rejected .local domains. Update to the latest image:
cd /opt/webmacs
sudo docker compose -f docker-compose.prod.yml --env-file .env pull backend
sudo docker compose -f docker-compose.prod.yml --env-file .env up -d backend

Backend Marked as Unhealthy

On slower hardware (Raspberry Pi), the backend may be marked unhealthy during startup even though it’s still loading. The default start_period gives the app time to start before health checks begin counting failures.

If you see (unhealthy) but the logs show Uvicorn running on http://0.0.0.0:8000, just wait — the status will change to (healthy) after the next successful health check.

If it stays unhealthy, check the logs for errors:

sudo docker compose -f docker-compose.prod.yml --env-file .env logs backend 2>&1 | tail 50

Container Won't Start

# Check container state and exit code
docker compose ps -a

# View last logs before crash
docker compose logs backend --tail=50

# Common: database not ready yet
# The backend waits for PostgreSQL — check db container
docker compose logs db --tail=20

Disk Space Full

# Check Docker disk usage
docker system df

# Remove unused images and volumes
docker system prune -a --volumes

# Check database backup directory
du -sh /opt/webmacs/updates/backups/

Logging

Enable Debug Logging

Set DEBUG=true in your .env file:

# .env
DEBUG=true

Then restart:

docker compose restart backend

View Structured Logs

WebMACS uses structlog for structured logging. Filter by event type:

# All authentication events
docker compose logs backend | grep "auth\|login\|token"

# All WebSocket events
docker compose logs backend | grep "websocket\|ws_"

# All plugin events
docker compose logs controller | grep "plugin\|sync\|channel"

Getting Help

If your issue isn't covered here:

  1. Check the Logs in the WebMACS UI
  2. Search GitHub Issues
  3. Open a new issue with:
    • WebMACS version (curl http://localhost/api/v1/health)
    • Docker logs (last 100 lines)
    • Steps to reproduce

Next Steps