WordPress on AWS Lightsail

This project documents the migration of a live WordPress business website to AWS Lightsail, including DNS, SSL, static IP configuration, backup design, a production incident, and the redesign of the recovery strategy using automated snapshots.

Context

This project was built to migrate a live guitar teaching business website from traditional managed hosting to AWS, with lower ongoing cost and more direct control over the infrastructure. It is a live production site supporting an active teaching business.

Live site: cliffsmithguitarlessons.co.uk (production WordPress site hosted on AWS Lightsail)

Key constraints:

  • The site had to remain publicly available during migration, with minimal disruption.
  • HTTPS, email-related DNS records, and redirects all had to keep working correctly after cutover.
  • The solution needed to stay simple enough to manage as a solo operator while still being recoverable.

Architecture (Initial Design)

The website originally ran on an Amazon Lightsail instance in the London region using a Bitnami WordPress stack.

Traffic is directed to the site through DNS, with a static IP attached to the instance so the endpoint remains stable. SSL was configured for HTTPS, and the original recovery model combined Lightsail snapshots with S3-based backups.

This architecture was appropriate because it kept the stack simple while still covering the core production requirements: a fixed public endpoint, encrypted traffic, DNS control, and a practical backup strategy.

For a single WordPress business site, Lightsail provided enough flexibility without the overhead of designing a more complex multi-service AWS environment.

Key Decisions

Use Amazon Lightsail instead of a fully custom EC2-based architecture

Why: Lightsail provided a fast, simple way to deploy a production-ready WordPress environment with predictable monthly cost and minimal setup overhead.

Trade-off: Reduced flexibility compared to a custom architecture using EC2, ALB and RDS, and less control over scaling and infrastructure design.

Attach a static IP to ensure a stable public endpoint

Why: A fixed IP simplifies DNS configuration and ensures the site remains reachable even if the underlying instance is restarted or replaced.

Trade-off: Introduces a small amount of additional infrastructure management and dependency on correct IP association.

Implement an initial layered backup strategy using snapshots and S3

Why: The initial design combined Lightsail snapshots with S3 backups to provide redundancy and support recovery from both instance-level and application-level failures.

Trade-off: Adds operational overhead and requires ongoing validation to ensure backups are usable.

Maintain direct control of DNS during migration

Why: Managing DNS directly allowed precise control over cutover timing and ensured that web traffic, SSL validation, and email records were handled correctly.

Trade-off: Increased complexity during migration and a higher risk of misconfiguration if not handled carefully.

Challenges

Preserving DNS and email behaviour during migration

Migrating the site required more than moving the web server. DNS changes had to preserve existing MX, SPF, DKIM and DMARC records to avoid disrupting email delivery. This required careful validation before and after cutover to ensure both the website and email services continued to function correctly.

Managing a self-hosted WordPress stack on AWS

Moving away from managed hosting introduced responsibility for server updates, SSL configuration, redirects and recovery planning. While this provided greater control and significantly reduced cost, it also required a deeper understanding of the underlying Bitnami and Apache configuration.

Incident and Redesign

During routine maintenance, a sequence of issues exposed weaknesses in the original backup and recovery approach.

What happened:

  • System update failed due to incorrect /tmp permissions
  • SSL renewal reintroduced conflicting Apache redirect rules
  • Resulting redirect loop took the site offline

The site was recovered by auditing and correcting configuration across multiple Apache and WordPress files, restoring a stable HTTPS setup with a canonical non-www domain.

Key failure:

A restore test using the existing S3 backup process failed. The process relied on partial backups (database dump + wp-content) and required manual reconstruction of the environment.

This revealed that the original S3 process was closer to a migration aid than a reliable recovery mechanism.

Current Production Architecture

The current production architecture combines AWS Lightsail hosting with automated snapshots, monitoring, alerting and rollback recovery workflows.

The production WordPress site runs on the official AWS Lightsail WordPress blueprint in eu-west-2, using a static IP and DNS-based routing for stable public access.

Automated daily snapshots provide short-term recovery, while EventBridge-triggered Lambda snapshots provide longer-term retention. CloudWatch alarms, Route 53 health checks and SNS notifications monitor availability and operational health.

A separate recovery instance workflow allows snapshots to be restored and tested before cutover, reducing operational risk during migration and disaster recovery scenarios.

Recovery & Backup Strategy

The original S3-based backup system was replaced with a snapshot-based recovery strategy using Amazon Lightsail.

New approach:

  • Daily automatic snapshots (7-day rolling window — automatic snapshots in Lightsail are limited to 7 days)
  • Weekly automated snapshots via Lambda (12-month retention, to bypass the above limitation)

An EventBridge schedule triggers a Lambda function to create weekly snapshots, with CloudWatch monitoring and SNS email alerts for failure detection.

This design improves both short-term recovery capability and long-term operational resilience without requiring manual intervention.

Platform Migration

After AWS announced the retirement of Bitnami-packaged Lightsail blueprints, the site was migrated from the older Bitnami WordPress stack to the official AWS Lightsail WordPress blueprint.

Migration approach:

  • Built a new official Lightsail WordPress instance in eu-west-2
  • Migrated the full WordPress site using Duplicator after plugin size limits blocked the initial restore method
  • Reattached the existing production static IP to avoid DNS propagation and preserve mail-related DNS behaviour
  • Reconfigured HTTPS manually using Certbot after Lightsail SSL validation issues
  • Updated the backup Lambda target and verified snapshot automation on the new instance

The migration preserved the live domain, Google Workspace mail flow, contact forms, analytics, SSL, backups and rollback capability.

The final platform now runs on the official AWS Lightsail WordPress blueprint with a simpler recovery model, working automated snapshots and reduced operational complexity.

Monitoring & Operational Stability

After migration to the official Lightsail WordPress blueprint, the production site experienced intermittent instability caused by memory pressure on a 1 GB instance with no swap configured.

Investigation showed the issue was most likely caused by temporary RAM exhaustion during periods of increased WordPress, Apache or background maintenance activity.

Operational improvements implemented:

  • Configured a persistent 1 GB swap file for improved stability under memory pressure
  • Implemented Route 53 HTTPS health checks for external uptime monitoring
  • Added CloudWatch alarms and SNS email notifications for outage detection
  • Added Lightsail alarms for CPU utilisation, burst capacity and status check failures
  • Validated end-to-end alerting using a real Apache outage test

These changes improved operational visibility and significantly strengthened the resilience of the production environment. Shortly after deployment, the monitoring system detected a genuine production outage caused by memory exhaustion on the 1 GB instance, leading to further investigation, capacity analysis and an eventual upgrade to a 2 GB plan.

Cost

Typical steady-state infrastructure cost: ~$6–8/month

  • Lightsail 2 GB instance → ~$7/month (+ VAT)
  • Route 53 → ~$0.51/month

Equivalent AWS cost: approximately £6–7/month including VAT, depending on exchange rates and snapshot usage.

Previous hosting cost: £94.80/month (incl. VAT)

Cost reduction: ~92–94%

Costs temporarily increased during migration, restore testing and snapshot redesign work due to additional Lightsail instances and snapshots running in parallel.

The redesign improved recoverability without materially changing the overall cost of the platform.

Outcome

The project successfully moved a live business website onto AWS, improved its recovery model after a real production incident, and later migrated the platform away from the deprecated Bitnami blueprint to the official AWS Lightsail WordPress stack.

  • The site is live in production on the official AWS Lightsail WordPress blueprint.
  • HTTPS, redirects, contact forms, analytics and Google Workspace mail flow are working correctly.
  • Typical monthly hosting cost was reduced by ~92–94% compared with the previous managed hosting setup.
  • Recovery and monitoring now include automated snapshots, rollback workflows, Route 53 health checks, CloudWatch alarms and SNS alerting.
  • Monitoring later detected a real production outage, leading to the identification of memory pressure issues and an upgrade from a 1 GB to 2 GB Lightsail instance.
  • The platform now balances simplicity, cost control, recoverability and operational reliability.

Next Steps

The platform is now stable and operating with automated backups, monitoring and alerting. Future improvements will focus on security, resilience and operational maturity.

  • Migrate domain registration from Xilo to Amazon Route 53 for simplified management and tighter integration with AWS services
  • Place the production site behind CloudFront to improve performance, reduce direct exposure of the Lightsail instance and provide a foundation for additional security controls
  • Evaluate AWS WAF integration to provide protection against common web attacks and automated scanning activity
  • Continue monitoring resource utilisation on the upgraded 2 GB instance to validate long-term capacity requirements