Alerting kanalas

Platform: ntfy.sh — push notifications be infrastruktūros. Topic: play-army-alerts

Kas siunčia alertus

ŠaltinisTipasPrioritetas
restic-backup.shBackup OK/FAILdefault / urgent
restic-restore-drill.shRestore drill OK/FAILdefault / urgent
play-army-alert@.serviceSystemd OnFailure hookurgent
fail2ban ntfy-localBan event (sshd, nginx)high
lynis-weekly-audit.shLynis score alerthigh (jei <70)
CF Worker play-army-statusSite DOWN detectionurgent

Skriptai

  • send-ntfy-alert.sh — bendras notifier (naudojamas visų)
  • fail2ban-ntfy-event.sh — formatuoja ban event detales
  • systemd-notify-failure.sh — journal excerpt su unit info

External uptime monitor

Stack: Cloudflare Worker + KV + Cron Trigger

CF Worker play-army-status kas minutę tikrina 3 endpointus per HTTPS:

TargetTikrinimas
play.armyHTTP 200
panel.play.armyHTTP 200 (CF Access redirect = OK)
node.play.armyHTTP 401 = UP (Wings auth required)

Jei DOWN → urgent ntfy alert.

VPS Heartbeat

VPS kas minutę siunčia heartbeat su 8 servisų statusu:

nginx, mariadb, redis, wings, crowdsec, cloudflared, fail2ban, ssh

Jei heartbeat neateina >3 min → VPS unreachable alert.

Systemd: play-army-heartbeat.timer + play-army-heartbeat.service

Status page

URL: https://status.play.army

Viešas status page rodo:

  • Web servisų latency, status code, uptime istoriją (1h)
  • VPS servisų heartbeat grid (8 servisai)
  • Overall status: operational / degraded / major outage

API: https://status.play.army/api/status (JSON)

Internet → status.play.army
              │
              ├── GET /           → HTML status page
              ├── GET /api/status → JSON API
              └── POST /heartbeat → VPS heartbeat receiver (Bearer auth)