PagerPal v1 Deployment Notes#

PagerPal v1 is currently designed as a single-node appliance: one FastAPI application process and one in-process APScheduler instance share the same database.

Safe v1 runtime model#

Run v1 with exactly one app process:

uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1

Production-like deployments should not use --reload. The repository's python run.py entry point enables reload and is intended for local development.

The Docker image uses the same one-worker Uvicorn command by default:

uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1

If you prefer Gunicorn as the process manager, keep it to one worker for v1:

gunicorn app.main:app -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 --workers 1

With the default settings, the single app process starts APScheduler during FastAPI startup. That scheduler runs the notification retry and automatic escalation jobs in-process:

notification_retry_worker_enabled=true
escalation_worker_enabled=true

This setup is acceptable for the current v1 appliance mode because there is only one scheduler instance making background-job decisions.

Database guidance#

The default database URL is SQLite (sqlite:///./pagerpal.db). SQLite is acceptable for local development and small single-node appliance deployments when the database file is on durable storage and backed up.

Use PostgreSQL once PagerPal is hosted in AWS or operated as a durable service. PostgreSQL is recommended for better concurrency, operational tooling, backups, and row-lock behavior.

Configure the database with the DATABASE_URL environment variable. Example placeholder only:

DATABASE_URL=postgresql+psycopg://pagerpal:<password>@<host>:5432/pagerpal

Do not commit real credentials or secrets.

Run database migrations before starting a newly deployed app or after pulling schema changes:

python -m alembic upgrade head

Create the first admin account after migrations:

python scripts/create_admin.py --email '<admin-email>'

On a fresh interactive deployment, /login also exposes a first-run admin bootstrap form until one login-enabled admin exists. Existing responder rows without a password remain valid responder identities but cannot sign in.

When ENVIRONMENT=production, PagerPal fails startup if the live database is not at the repository's Alembic head. Local/development mode still calls create_all() on startup for demo convenience, but production deployments must use Alembic so SQLite and PostgreSQL schemas are reproducible.

The local demo seed script targets http://127.0.0.1:8000 by default. To seed a server on a different local port, set PAGERPAL_BASE_URL, for example:

PAGERPAL_BASE_URL=http://127.0.0.1:8001 python seed_data.py

Docker Compose quick start#

A fresh clone can run PagerPal locally with Docker Compose. The default Compose configuration uses SQLite stored in the named Docker volume pagerpal-data, disables outbound Infobip sending, and starts exactly one app process.

docker compose up --build

In another shell, verify the health endpoint:

curl -f http://127.0.0.1:8000/health

Production-like PostgreSQL smoke#

Run the production-like smoke script before deploying Compose changes:

./scripts/smoke-production-compose

This script starts the base Compose file plus docker-compose.production-smoke.yml with ENVIRONMENT=production, PostgreSQL, explicit non-wildcard origins, secure session-cookie settings, smoke-only placeholder credentials, and notification sending disabled. The app container waits for PostgreSQL, runs python -m alembic upgrade head, and then starts Uvicorn with one worker. The script verifies:

/health is public and returns OK.
unauthenticated UI access is redirected to /login.
/dashboard and /api/v1/system/jobs load with a smoke admin account.
/api/v1/system/jobs reports the notification_retry and escalation workers.

The smoke env file is temporary, secrets are placeholders/[REDACTED], and cleanup removes the Compose stack and PostgreSQL smoke volume on exit.

For local testing you can run without a .env file because docker-compose.yml includes safe local defaults. Before deploying on an EC2 host, copy .env.example to .env and replace every placeholder value:

cp .env.example .env
$EDITOR .env
python -m alembic upgrade head
docker compose up -d --build

Never commit .env or any real credential values. Hosted deployments should set ENVIRONMENT=production; in that mode PagerPal refuses to start with the local default SECRET_KEY, insecure session cookies, wildcard CORS, or a database that has not been migrated to Alembic head.

Set PAGERPAL_BASE_URL to the HTTPS URL operators use to open the app. PagerPal embeds this value in WhatsApp and email notification links. Set ALLOWED_ORIGINS to an explicit comma-separated allowlist for production instead of relying on the local * default. Set SESSION_COOKIE_SECURE=true whenever PagerPal is served through HTTPS; the Terraform deployment exposes this as session_cookie_secure.

For real paging, configure at least one outbound provider before enabling NOTIFICATION_SENDING_ENABLED=true:

Infobip SMS/WhatsApp: INFOBIP_BASE_URL, INFOBIP_API_KEY, INFOBIP_SMS_SENDER, and/or INFOBIP_WHATSAPP_SENDER
SMTP email: SMTP_HOST, SMTP_PORT, optional SMTP_USER/SMTP_PASSWORD, and SMTP_FROM

If exposing Infobip delivery receipts, set INFOBIP_RECEIPT_TOKEN and configure Infobip to send that value as X-Infobip-Receipt-Token to /api/v1/notifications/infobip/receipts.

Alert ingestion endpoints (/api/v1/alerts, /api/v1/webhooks/grafana, and /api/v1/webhooks/cloudwatch) keep their per-source API key authentication and also enforce a per-source token bucket. Tune burst and refill behavior with:

ALERT_SOURCE_RATE_LIMIT_CAPACITY=120
ALERT_SOURCE_RATE_LIMIT_REFILL_PER_SECOND=2.0

PagerPal writes structured JSON application logs to stdout. Each request receives an X-Request-ID response header, reusing the inbound header when present, and log records include the same request_id. Secret-like values and configured provider secrets are redacted before log output.

EC2 single-node path#

The intended low-cost AWS path for v1 is one EC2 instance running Docker Engine and Docker Compose:

Provision one small EC2 instance and attach durable EBS storage.
Install Docker and the Docker Compose plugin.
Clone PagerPal on the instance.
Copy .env.example to .env and set deployment-specific values.
Start the app with docker compose up -d --build.
Confirm curl -f http://127.0.0.1:8000/health succeeds on the instance.
Put a TLS-terminating reverse proxy or AWS load balancer in front of port 8000 before exposing PagerPal publicly.

For the MVP SQLite path, keep DATABASE_URL=sqlite:////data/pagerpal.db; the Compose volume persists /data/pagerpal.db. Back up the underlying Docker volume/EBS data regularly.

For a more durable AWS deployment, use PostgreSQL instead of SQLite. Point DATABASE_URL at an RDS PostgreSQL instance or a PostgreSQL service running on the same host. Example placeholder only:

DATABASE_URL=postgresql+psycopg://pagerpal:<database-password>@<database-host>:5432/pagerpal

Keep the v1 scheduler limitation in mind: do not run more than one PagerPal app container with the worker flags enabled.

Static documentation hosting#

The generated documentation site is customer-facing only. Build it before publishing:

python3 scripts/build_docs.py

The build writes only the public docs routes to docs/site: home, architecture, operator guide, API reference, deployment notes, and glossary. Internal design notes, ADRs, plans, reviews, and specs are not rendered or indexed.

Publish the generated site to the documentation bucket with the helper script:

python3 scripts/publish_docs_s3.py --bucket '<docs-bucket-name>' --dry-run
python3 scripts/publish_docs_s3.py --bucket '<docs-bucket-name>' --delete

If the bucket is served through CloudFront, pass the distribution ID to invalidate changed HTML/assets after upload:

python3 scripts/publish_docs_s3.py \
  --bucket '<docs-bucket-name>' \
  --cloudfront-distribution-id '<distribution-id>' \
  --delete

The publish helper refuses to run unless the build manifest is marked customer, and it excludes generated review artifacts such as _screenshots/ and _lighthouse/ from uploads.

Backup, restore, and upgrade runbooks#

SQLite#

SQLite is file-based, so take backups only while writes are stopped or while the database is quiescent.

docker compose stop app
docker compose run --rm --no-deps app sh -c 'cp /data/pagerpal.db /data/pagerpal.db.backup.$(date -u +%Y%m%dT%H%M%SZ)'
docker compose up -d app

Restore by stopping the app, replacing the database file, applying migrations, and then starting the app again:

docker compose stop app
docker compose run --rm --no-deps app sh -c 'cp /data/pagerpal.db.restore /data/pagerpal.db'
docker compose run --rm app python -m alembic upgrade head
docker compose up -d app

For a host-level restore, replace /data/pagerpal.db in the Docker volume or EBS mount, then run the same python -m alembic upgrade head command before serving traffic.

PostgreSQL#

Use managed snapshots when running on RDS. For a self-managed PostgreSQL container or host, take a custom-format dump:

pg_dump --format=custom --file=pagerpal.dump "$DATABASE_URL"

Restore into an empty database, then apply Alembic migrations before returning PagerPal to service:

pg_restore --clean --if-exists --dbname "$DATABASE_URL" pagerpal.dump
python -m alembic upgrade head

The production-like Compose smoke path verifies this startup path against PostgreSQL: the app waits for Postgres health, upgrades to Alembic head, and only then starts Uvicorn.

Deployment Upgrade Order#

Use this order for every deployment:

Back up SQLite or PostgreSQL.
Deploy the new image or code checkout.
Run python -m alembic upgrade head.
Start exactly one app process with scheduler flags enabled.
Confirm /health, /dashboard, and /api/v1/system/jobs.

Important scheduler limitation#

APScheduler currently runs inside the FastAPI process. If PagerPal is started with multiple app processes or multiple hosts, each process will start its own scheduler unless the worker flags are disabled for that process. Duplicate schedulers can cause duplicate notification retries, duplicate escalation attempts, and avoidable database contention.

Do not run v1 with any of the following until a singleton scheduler design is implemented:

uvicorn --workers N where N > 1 and scheduler flags remain enabled in more than one worker
multiple systemd services/VMs/containers all running the app with scheduler flags enabled
autoscaled AWS app instances all running the in-process scheduler

If scaling beyond one process#

Before running PagerPal as a multi-worker or multi-node service, add one of these coordination patterns:

A singleton scheduler process, with web workers configured not to start background jobs.
A database-backed lock around scheduler startup or each job run.
PostgreSQL advisory locks for scheduled jobs.
An external worker/queue system that owns retries and escalations outside the web app.

Until one of those exists, keep the v1 deployment to one application process with one scheduler.

Known limitations#

Background jobs stop when the single app process stops.
There is no leader election or cross-process scheduler lock.
SQLite does not provide the same row-lock behavior as PostgreSQL.
High availability and horizontal scaling require scheduler coordination work before production use.
SSO/SAML is not implemented; production auth is local User Accounts with RBAC.