In part 1 of this article, we explored the 3 first steps. We describe the remaining steps in this second instalment.
4 Automate deployments and updates
Deployments should be repeatable and fully automated. No-one should have access to production servers. Avoid having root access.
Use a configuration management tool for your infrastructure. Make sure OS configuration and firewall rules are managed centrally and have an audit trail.
Consider implementing services redundancy so security updates can be applied with no downtime.
Make sure your backups can be restored. Prefer cluster-wide, automated, consistent backups (including MySQL, writable mounts, Solr…).
5 Implement observability
Monitor CPU, Memory, Disk usage, and global latency (as seen by the user). Configure alerting at levels less than critical.
Integrate alerting with chatops (through Zapier, Hipchat, Slack) implement time-based rules for escalation (like PagerDuty).
Centralize logging, and make sure log access is protected.
Instrument production for app performance monitoring, use tools like Blackfire.io or NewRelic.
Keep a global audit log for changes to the application, the infrastructure, and user access rules.
Monitor cloud resources cost by implementing an optimisation strategy for instance reservations.
6 Implement high availability and scaling
Implement high availability and load balancing by deploying every service (including MySQL with Galera) to a cluster. Prefer active-active replication when you can.
Implement automated failover. When a cluster member fails or misbehaves, have an automated procedure to kill it and create a new one.
Consider triple redundancy as the minimum per server to allow for zero-downtime scaling and security updates (so you can take one element offline and update it while still having redundancy).
Deploy to a cloud provider that offers instant creation of new machines, automate the creation of machines and their cluster configuration. Consider scaling first vertically (making each cluster member bigger) and only secondly horizontally (adding more members to each service cluster).
Serve all of your traffic through a CDN, consider using a multi-tiered approach (cheap CDN for static assets, full-featured CDN with global instant purging and tag based purging for the Drupal content).