🆕 This guide includes AWS' most recent updates (June 2024)
AWS launched the AWS Well-Architected Framework to help cloud architects design & operate securely & efficiently, and help teams make better-informed decisions when building applications.
You've probably heard about Well-Architected. Maybe you'd like to get certified as an AWS Well-Architected Partner?
Here's what you need to know about the framework:
What is the AWS Well-Architected Framework?
The AWS Well-Architected Framework was introduced in 2015. It describes the key concepts, design principles, and best practices to consider when operating in the cloud. On paper, it's applicable to AWS, but the vast majority of the content applies to any brand of cloud architecture.
Those adhering to the framework should ensure their cloud is secure, resilient, performant, and sustainable. After answering a few foundational questions, you can see how well your architecture aligns with best practice, and how to improve it.
In AWS' own words:
"if you neglect the six pillars...it can become challenging to build a system that delivers on your expectations and requirements"
The framework also includes domain-specific 'lenses'. These Lenses go into more detail than the general guidance does, covering domains including machine learning, data analytics, IoT, media streaming, financial services, and gaming.
What are the AWS Well-Architected Framework Pillars?
The AWS Well-Architected Framework had grown steadily from 4 to 6 pillars since its inception in late 2015. The current pillars are:
- Operational Excellence
- Security
- Reliability
- Performance Efficiency
- Cost Optimization
- Sustainability
The addition of Operational Excellence turned 4 pillars into 5 in November 2016. The 6th pillar, Sustainability, has been in place since AWS announced it in December 2021, a convenient fit for AWS' 2025 100% renewable power target.
Lots of us have grown weary of construction references in technology (think Agile v Waterfall debates), but they remain difficult to avoid. Think of the pillars as core operating principles, or areas of high-level guidance. To use another construction reference, the pillars are at the foundation of how cloud architects should govern their AWS setup.
The AWS Well-Architected Framework Pillars
AWS Well-Architected Pillar Structure
Each of the six pillars has:
- An official Definition
- Multiple Design Principles
- Multiple Best Practices, grouped into areas/topics
- A prescriptive guide (referred to as a White Paper) with links to other useful resources, e.g. case studies, training, detailed guides
Importantly, each Best Practice category has at least one self-assessment Question.
These questions are uniquely labeled, e.g. OPS1, OPS2, and are designed to help organizations assess their adherence to the pillars (and discover improvements they can make).
The AWS Well-Architected Framework Checklist
If you're after a one-page summary of all of the pillars, design principles, areas of best practice, and questions, you're in the right place.
A map of the AWS Well-Architected Framework (source: AWS)
1. Operational Excellence Pillar
This pillar focuses on the day-to-day running and monitoring of systems and the continuous improvement of processes and procedures. Important topics include defining standards, automation of changes, and responding to events.
Design Principles
- Organize teams around business outcomes
- Implement observability for actionable insights
- Safely automate where possible
- Make frequent, small, reversible changes
- Refine operations procedures frequently
- Anticipate failure
- Learn from all operational events and metrics
- Use managed services
Best Practice Areas/Topics & Self-Assessment Questions
Organization
- OPS 1: How do you determine what your priorities are?
- OPS 2: How do you structure your organization to support your business outcomes?
- OPS 3: How does your organizational culture support your business outcomes?
Prepare
- OPS 4. How do you implement observability in your workload?
- OPS 5. How do you reduce defects, ease remediation, and improve flow into production?
- OPS 6. How do you mitigate deployment risks?
- OPS 7. How do you know that you are ready to support a workload?
Operate
- OPS 8. How do you utilize workload observability in your organization?
- OPS 9. How do you understand the health of your operations?
- OPS 10. How do you manage workload and operations events?
Evolve
- OPS 11. How do you evolve operations?
2. Security Pillar
This pillar focuses on protecting your information and systems. Important topics include user permissions, security event detection, and data integrity & confidentiality.
Design Principles
- Implement a strong identity foundation
- Maintain traceability
- Apply security at all layers
- Automate security best practices
- Protect data in transit and at rest
- Keep people away from data
- Prepare for security events
Best Practice Areas/Topics & Self Assessment Questions
Security Foundations
- SEC 1: How do you securely operate your workload?
Identity and Access Management
- SEC 2: How do you manage authentication for people and machines?
- SEC 3: How do you manage permissions for people and machines?
Detection
- SEC 4: How do you detect and investigate security events?
Infrastructure Protection
- SEC 5: How do you protect your network resources?
- SEC 6: How do you protect your compute resources?
Data Protection
- SEC 7: How do you classify your data?
- SEC 8: How do you protect your data at rest?
- SEC 9: How do you protect your data in transit?
Incident Response
- SEC 10: How do you anticipate, respond to, and recover from incidents?
Application Security
- SEC 11. How do you incorporate and validate the security properties of applications throughout the design, development, and deployment lifecycle?
3. Reliability Pillar
This pillar focuses on ensuring workloads perform their intended functions and can recover quickly when things go wrong. Important topics include recovery planning, adapting to ever-changing requirements, and distributed system design.
Design Principles
- Automatically recover from failure
- Test recovery procedures
- Scale horizontally to increase aggregate workload availability
- Stop guessing capacity
- Manage change through automation
Best Practice Areas/Topics & Self-Assessment Questions
Foundations
- REL 1: How do you manage service quotas and constraints?
- REL 2: How do you plan your network topology?
Workload Architecture
- REL 3: How do you design your workload service architecture?
- REL 4: How do you design interactions in a distributed system to prevent failures?
- REL 5: How do you design interactions in a distributed system to mitigate or withstand failures?
Change Management
- REL 6: How do you monitor workload resources?
- REL 7: How do you design your workload to adapt to changes in demand?
- REL 8: How do you implement change?
Failure Management
- REL 9: How do you back up data?
- REL 10: How do you use fault isolation to protect your workload?
- REL 11: How do you design your workload to withstand component failures?
- REL 12: How do you test reliability?
- REL 13: How do you plan for disaster recovery (DR)?
4. Performance Efficiency Pillar
This pillar focuses on the structured and streamlined allocation of IT resources. Important topics include monitoring, maintaining efficiency as requirements evolve, and optimizing resource size and type to match workloads.
Design Principles
- Democratize advanced technologies
- Go global in minutes
- Use serverless architectures
- Experiment more often
- Consider mechanical sympathy
Best Practice Areas/Topics & Self-Assessment Questions
Architecture Selection
- PERF 1: How do you select appropriate cloud resources and architecture patterns for your workload?
Compute and Hardware
- PERF 2: How do you select and use compute resources in your workload?
Data Management
- PERF 3: How do you store, manage, and access data in your workload?
Network and Content Delivery
- PERF 4: How do you select and configure networking resources in your workload?
Process and Culture
- PERF 5: What process do you use to support more performance efficiency for your workload?
5. Cost Optimization Pillar
This pillar focuses on avoiding unnecessary costs. Key topics include understanding spending over time and controlling fund allocation, selecting resources of the right type and quantity, and scaling to meet business needs without overspending.
If the AWS cost optimization pillar is executed correctly, your organization can achieve zen-like AWS cost efficiency. These AWS principles will guide you through best practices and force you to analyze your current challenges and needs.
Design Principles
- Implement Cloud Financial Management
- Adopt a consumption model
- Measure overall efficiency
- Stop spending money on undifferentiated heavy-lifting
- Analyze and attribute expenditure
Best Practice Areas/Topics & Self-Assessment Questions
Practice Cloud Financial Management
- COST 1: How do you implement cloud financial management?
Expenditure and Usage Awareness
- COST 2: How do you govern usage?
- COST 3: How do you monitor usage and cost?
- COST 4: How do you decommission resources?
Cost-effective Resources
- COST 5: How do you evaluate cost when you select services?
- COST 6: How do you meet cost targets when you select resource type, size and number?
- COST 7: How do you use pricing models to reduce cost?
- COST 8: How do you plan for data transfer charges?
Manage Demand and Supply Resources
- COST 9: How do you manage demand, and supply resources?
Optimize Over Time
- COST 10: How do you evaluate new services?
🤓 What's the FinOps Framework? Find out in our guide to FinOps.
6. Sustainability Pillar
This pillar focuses on minimizing your cloud's environmental impact. Important topics include understanding the impact, optimizing utilization, and establishing a shared responsibility model for sustainability.
Design Principles
- Understand your impact
- Establish sustainability goals
- Maximize utilization
- Anticipate and adopt new, more efficient hardware and software offerings
- Use managed services
- Reduce the downstream impact of your cloud workloads
Best Practice Areas/Topics & Self-Assessment Questions
Region Selection
- SUS 1: How do you select Regions for your workload?
Alignment to Demand
- SUS 2: How do you align cloud resources to your demand?
Software and Architecture
- SUS 3: How do you take advantage of software and architecture patterns to support your sustainability goals?
Data Management
- SUS 4: How do you take advantage of data management policies and patterns to support your sustainability goals?
Hardware and Services
- SUS 5: How do you select and use cloud hardware and services in your architecture to support your sustainability goals?
Process and Culture
- SUS 6: How do your organizational processes support your sustainability goals?
How Can I Apply AWS Framework?
There's a lot of information to take in, without even going into the details of the best practices. You're probably wondering what to do next!
Here's what we'd recommend:
- Consider a training course to cover the framework in more detail. AWS offers official classroom sessions with a live instructor (and private classes available).
- Form a project team that includes key technology and business stakeholders. Don't forget teams like Compliance, Product Management, and Marketing - all of which could be directly impacted by, or heavily reliant on, AWS setup changes. Pull together the team, scope and objectives using a lightweight project management framework or template.
- Define the workload, i.e. the scope of your review. This could be as small as a static website, or a large, complex microservices architecture.
- Using the framework questions and best practices, pull together your technical team and conduct your own architectural assessment. This won't be a 5-minute job, so plan it well and be patient.
- Use the built-in AWS Well-Architected Tool to carry out an initial review of your architecture and identify improvements
- Enforcing hundreds of best practices is time-consuming and error-prone. Look for a tool that has built-in Well-Architected monitoring and automated remediation. Very quickly, you'll see that these tools save you time, money, and risk.
Hyperglance & AWS
Hyperglance gives you complete cloud management enabling you to have confidence in your security posture and cost management whilst providing you with enlightening, real-time architecture diagrams.
Monitor your cloud security & compliance, manage costs & reduce your bill, explore interactive diagrams & inventory, and utilize powerful built-in automation. Save time & money and get complete peace of mind.
Experience it all, for free, with a 14-day trial.
About The Author: David Gill
As Hyperglance's Chief Technology Officer, David looks after product development & maintenance, providing strategic direction for all things tech. Having been at the core of the Hyperglance team for over 10 years, cloud optimization is at the heart of everything David does.