PagerPal Architecture#

PagerPal is a self-hosted incident response appliance for small teams. The customer-facing architecture is intentionally simple: monitoring systems send authenticated alerts to PagerPal, PagerPal creates or updates incidents, responders are selected from schedules and escalation policies, and notification providers deliver SMS, WhatsApp, or email messages.

The v1 deployment model is one application process, one database, and one in-process worker scheduler. That shape keeps setup small and predictable while making one rule important: run exactly one worker-enabled PagerPal process.

Overview#

PagerPal sits between monitoring tools and responders.

Alert sources send generic, Grafana, or CloudWatch/SNS webhook payloads.
PagerPal validates the alert source key, creates or deduplicates an incident, records timeline events, and queues notifications.
Schedules and escalation policies decide who should be paged first and who should be paged next.
Notification providers send SMS, WhatsApp, or email when real sending is enabled.
Delivery receipts can update notification evidence when a provider supports callback status.
Operators use the web UI or management API to acknowledge, resolve, reopen, manually escalate, or retry notifications.

Layer Map#

ARCHITECTURE FLOW

Alert source to PagerPal, database, outbox, provider, and responder.

PagerPal stores incident state, schedules, escalation policy configuration, alert source records, user accounts, and notification logs in the configured database. SQLite is suitable for local/demo use. PostgreSQL is recommended for hosted deployments.

Connection Flow#

A monitoring system sends an alert webhook with an alert source API key.
PagerPal validates the key against an active alert source.
PagerPal creates a new incident or updates the matching open incident by source and external alert ID.
PagerPal resolves the current responder from team schedules and escalation policy configuration.
PagerPal queues notification outbox rows for the selected channel and recipient.
The retry worker dispatches queued notifications through the configured provider.
Provider send results and optional delivery receipts update notification evidence.
Operators act on the incident from the UI or management API.

Incident Lifecycle#

INCIDENT LIFECYCLE

Triggered, acknowledged, resolved, reopen, escalation, and retry transitions.

NOTIFICATION OUTBOX

PagerPal writes an outbox row, workers dispatch it, and receipts update delivery evidence.

Incidents start as triggered. A responder or admin can acknowledge ownership, resolve the incident, reopen a resolved incident, manually escalate to the next target, or retry failed/exhausted notification logs after provider configuration is fixed.

Recovery webhooks from supported alert sources can resolve a matching open incident when the alert source and external alert ID match. Repeated recovery events are safe and should not create duplicate resolved incidents.

Security Boundaries#

Boundary	Customer-facing behavior
Web UI	Users sign in with a login-enabled account. The UI uses a signed session cookie.
Management API	Scripts and integrations authenticate with a real user account using Basic auth.
Alert ingestion	Monitoring tools authenticate with a per-source API key.
Delivery receipts	Infobip receipt callbacks can require `X-Infobip-Receipt-Token` when configured.
Health checks	`/health` is public so load balancers can verify the app is alive.
Secret display	Full alert source keys are shown only immediately after create/regenerate. List and detail views show masked keys.

Use header-based webhook credentials where possible, for example X-API-Key: <alert-source-key>. Avoid query-string credentials because they can appear in logs, screenshots, browser history, and proxy records.

Deployment Shape#

The recommended v1 hosted shape is a single VM or container host running Docker Compose with one PagerPal app process.

Run Uvicorn with --workers 1.
Keep retry and escalation workers enabled in exactly one app process.
Put HTTPS in front of the app before exposing it to operators.
Use PostgreSQL for durable hosted deployments.
Back up the database before upgrades.
Run Alembic migrations before starting new application code.

Do not run multiple worker-enabled containers, multiple Uvicorn workers, or autoscaled app instances until scheduler coordination is added. Duplicate schedulers can duplicate retries, escalation attempts, and provider sends.

Customer Configuration#

Customers configure:

public base URL used in notification links,
database URL,
allowed origins,
secure session-cookie mode,
admin and responder accounts,
teams and team memberships,
on-call schedules and overrides,
escalation policies and levels,
alert sources and webhook keys,
notification providers for SMS, WhatsApp, and/or email,
optional Infobip delivery receipt token.

All documentation examples use placeholders such as <alert-source-key>, <database-host>, and <smtp-password>. Do not paste real customer secrets into tickets, screenshots, docs, or source control.

Operational Limits#

v1 is single-node and single-scheduler.
Background jobs stop when the app process stops.
SQLite is useful for local/demo deployments; PostgreSQL is preferred for hosted use.
Horizontal scaling requires a singleton worker, distributed lock, or external queue design before enabling multiple app processes.
SSO/SAML is not part of v1; authentication is local user accounts with role-based access.