24/7 ops for an AI health platform

Round-the-clock managed AWS operations during hypergrowth — SOC2-ready in 4 months with a 15-minute incident SLA.

15 min

Off-hours incident SLA

70%

Mean time to resolve, reduction

4 mo

SOC2-ready

The problem

Upheal's AI mental health platform was scaling fast on a product engineering team with no spare DevOps capacity. Off-hours incidents waited until morning — a real risk for clinicians using the platform during patient sessions — disaster recovery had never been tested, and SOC2 attestation was becoming a hard prerequisite for enterprise deals on PHI-bearing data.

What we shipped

Layered CloudWatch monitoring with application-level signals (clinical-note API latency, AI queue depth, DB pool utilisation) into Slack-native alerting backed by a 15-minute on-call SLA. SOC2 readiness work hardened IAM with enforced MFA and quarterly reviews, centralised CloudTrail into object-locked S3, and forced all changes through version-controlled pipelines. Cross-region RDS replication, S3 versioning and quarterly DR exercises gave Upheal verifiable recovery procedures.

The outcome

Mean time to acknowledge dropped from hours to under 10 minutes and MTTR for common incidents fell 70% through runbook automation. Two real incidents during the engagement were resolved cleanly using the tested DR procedures with no data loss. SOC2 compliance readiness landed in four months, and the engineering team came off infrastructure on-call entirely.

Under the hood

Amazon CloudWatchAWS CloudTrailAWS IAMAmazon RDSAmazon S3AWS KMSAWS Config

“Having Remāngu manage our AWS operations meant our engineers could focus entirely on building the product. Their response times and proactive approach gave us confidence that our infrastructure was in expert hands.”
— Andre Lampe, Co-founder, Upheal

Next case study

Nordiq Financial

24/7 ops for an AI health platform

The problem

What we shipped

The outcome

Landing zone for a Nordic fintech