Well-Architected Framework Design Principles Background Read this background on the AnyCompany Corporation

Well-Architected Framework Design Principles

Background

Read this background on the AnyCompany Corporation before answering the questions below. You might also want to refer to the Appendix in the AWS Well-Architected Framework (https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf)

AnyCompany Corporation Background:

When developing your answers, consider the following operational excellence questions for the AnyCompany Corporation from the AWS Well-Architected Framework. These do not need to be explicitly answered, but thinking through them will help you develop your answer.

(OPS 2) How do you design your workload so that you can understand its state?

(OPS 4) How do you mitigate deployment risk?

(OPS 5) How do you know that you are ready to support a workload?

Question 1:

Operational Excellence

The Operational Excellence pillar focuses on the ability to run and monitor systems to deliver business value, and to continually improve supporting processes and procedures. 

There are six design principles for operational excellence in the cloud:

Perform operations as code – Define your entire workload (that is, applications and infrastructure) as code and update it with code. Implement operations procedures as code and configure them to automatically trigger in response to events. By performing operations as code, you limit human error and enable consistent responses to events.

Annotate documentation – Automate the creation of annotated documentation after every build. Annotated documentation can be used by people and systems. Annotations can be used as input to your operations code.

Make frequent, small, reversible changes – Design workloads to enable components to be updated regularly. Make changes in small increments that can be reversed if they fail (without affecting customers when possible).

Refine operations procedures frequently – Look for opportunities to improve operations procedures. Evolve your procedures appropriately as your workloads evolve. Set up regular game days to review all procedures, validate their effectiveness, and ensure that teams are familiar with them.

Anticipate failure – Identify potential sources of failure so that they can be removed or mitigated. Test failure scenarios and validate your understanding of their impact. Test your response procedures to ensure that they are effective and that teams are familiar with their execution. Set up regular game days to test workloads and team responses to simulated events.

Learn from all operational failures – Drive improvement through lessons learned from all operational events and failures. Share what is learned across teams and through the entire organization.

For the Operational Excellence Pillar, answer the following questions with respect to the AnyCompany Corporation:

What is the current state (what is AnyCompany doing now)?

What is the future state (what do you think AnyCompany should be doing)?

What is the top improvement the company could make?

Answer:

Answer:

Question 2:

Security

The Security pillar focuses on the ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.

There are seven design principles that can improve security:

Implement a strong identity foundation – Implement the principle of least privilege and enforce separation of duties with appropriate authorization for each interaction with your AWS resources. Centralize privilege management and reduce or even eliminate reliance on long-term credentials.

Enable traceability – Monitor, alert, and audit actions and changes to your environment in real time. Integrate logs and metrics with systems to automatically respond and take action.

Apply security at all layers – Apply defense in depth and apply security controls to all layers of your architecture (for example, edge network, virtual private cloud, subnet, and load balancer; and every instance, operating system, and application).

Automate security best practices – Automate security mechanisms to improve your ability to securely scale more rapidly and cost effectively. Create secure architectures and implement controls that are defined and managed as code in version-controlled templates.

Protect data in transit and at rest – Classify your data into sensitivity levels and use mechanisms such as encryption, tokenization, and access control where appropriate.

Keep people away from data – To reduce the risk of loss or modification of sensitive data due to human error, create mechanisms and tools to reduce or eliminate the need for direct access or manual processing of data.

Prepare for security events – Have an incident management process that aligns with organizational requirements. Run incident response simulations and use tools with automation to increase your speed of detection, investigation, and recovery.

For the Security Pillar, answer the following questions with respect to the AnyCompany Corporation:

What is the current state (what is AnyCompany doing now)?

What is the future state (what do you think AnyCompany should be doing)?

What is the top improvement the company could make?

Answer:

Answer:

Question 3:

Reliability

The Reliability pillar focuses on the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.

There are five design principles that can increase reliability:

Test recovery procedures – Test how your systems fail and validate your recovery procedures. Use automation to simulate different failures or to recreate scenarios that led to failures before. This practice can expose failure pathways that you can test and rectify before a real failure scenario.

Automatically recover from failure – Monitor systems for key performance indicators and configure your systems to trigger an automated recovery when a threshold is breached. This practice enables automatic notification and failure-tracking, and for automated recovery processes that work around or repair the failure.

Scale horizontally to increase aggregate system availability – Replace one large resource with multiple, smaller resources and distribute requests across these smaller resources to reduce the impact of a single point of failure on the overall system. 

Stop guessing capacity – Monitor demand and system usage, and automate the addition or removal of resources to maintain the optimal level for satisfying demand.

Manage change in automation – Use automation to make changes to infrastructure and manage changes in automation.

For the Reliability Pillar, answer the following questions with respect to the AnyCompany Corporation:

What is the current state (what is AnyCompany doing now)?

What is the future state (what do you think AnyCompany should be doing)?

What is the top improvement the company could make?

Answer:

Answer:

Question 4:

Performance Efficiency

The Performance Efficiency pillar focuses on the ability to use IT and computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes or technologies evolve.

There are five design principles that can improve performance efficiency:

Democratize advanced technologies – Consume technologies as a service. For example, technologies such as NoSQL databases, media transcoding, and machine learning require expertise that is not evenly dispersed across the technical community. In the cloud, these technologies become services that teams can consume. Consuming technologies enables teams to focus on product development instead of resource provisioning and management.

Go global in minutes – Deploy systems in multiple AWS Regions to provide lower latency and a better customer experience at minimal cost.

Use serverless architectures – Serverless architectures remove the operational burden of running and maintaining servers to carry out traditional compute activities. Serverless architectures can also lower transactional costs because managed services operate at cloud scale.

Experiment more often – Perform comparative testing of different types of instances, storage, or configurations.

Have mechanical sympathy – Use the technology approach that aligns best to what you are trying to achieve. For example, consider your data access patterns when you select approaches for databases or storage.

For the Performance Efficiency Pillar, answer the following questions with respect to the AnyCompany Corporation:

What is the current state (what is AnyCompany doing now)?

What is the future state (what do you think AnyCompany should be doing)?

What is the top improvement the company could make?

Answer:

Answer:

Question 5:

Cost Optimization

The Cost Optimization pillar focuses on the ability to run systems to deliver business value at the lowest price point.

There are five design principles that can optimize costs:

Adopt a consumption model – Pay only for the computing resources that you require. Increase or decrease usage depending on business requirements, not by using elaborate forecasting. 

Measure overall efficiency – Measure the business output of the workload and the costs that are associated with delivering it. Use this measure to know the gains that you make from increasing output and reducing costs.

Stop spending money on data center operations – AWS does the heavy lifting of racking, stacking, and powering servers, which means that you can focus on your customers and business projects instead of the IT infrastructure.

Analyze and attribute expenditure – The cloud makes it easier to accurately identify system usage and costs, and attribute IT costs to individual workload owners. Having this capability helps you measure return on investment (ROI) and gives workload owners an opportunity to optimize their resources and reduce costs.

Use managed and application-level services to reduce cost of ownership – Managed and application-level services reduce the operational burden of maintaining servers for tasks such as sending email or managing databases. Because managed services operate at cloud scale, cloud service providers can offer a lower cost per transaction or service.

For the Cost Optimization Pillar, answer the following questions with respect to the AnyCompany Corporation:

What is the current state (what is AnyCompany doing now)?

What is the future state (what do you think AnyCompany should be doing)?

What is the top improvement the company could make?

Answer:

Answer:

Trusted Advisor Recommendations

Background

AWS Trusted Advisor is an online tool that provides real-time guidance to help you provision your resources following AWS best practices. 

AWS Trusted Advisor looks at your entire AWS environment and gives you recommendations in five categories:

Cost Optimization – AWS Trusted Advisor looks at your resource use and makes recommendations to help you optimize cost by eliminating unused and idle resources, or by making commitments to reserved capacity.

Performance – Improve the performance of your service by checking your service limits, ensuring you take advantage of provisioned throughput, and monitoring for overutilized instances.

Security – Improve the security of your application by closing gaps, enabling various AWS security features, and examining your permissions.

Fault Tolerance – Increase the availability and redundancy of your AWS application by taking advantage of automatic scaling, health checks, Multi-AZ deployments, and backup capabilities.

Service Limits – AWS Trusted Advisor checks for service usage that is more than 80 percent of the service limit. Values are based on a snapshot, so your current usage might differ. Limit and usage data can take up to 24 hours to reflect any changes.

For a detailed description of the information that AWS Trusted Advisor provides, see AWS Trusted Advisor Best Practice Checks – https://aws.amazon.com/premiumsupport/technology/trusted-advisor/best-practice-checklist/

Scenario 1:

MFA on Root Account

Description: Checks the root account and warns if multi-factor authentication (MFA) is not enabled. For increased security, we recommend that you protect your account by using MFA, which requires a user to enter a unique authentication code from their MFA hardware or virtual device when interacting with the AWS console and associated websites.

Alert Criteria: MFA is not enabled on the root account.

Recommended Action: Log in to your root account and activate an MFA device.

For this recommendation, clearly answer these questions:

What is the status?

What is the problem?

What specific environment details are you given?

What is the best practice?

What is the recommended action?

Answer:

Answer:

Scenario 2:

IAM Password Policy

Description: Checks the password policy for your account and warns when a password policy is not enabled, or if password content requirements have not been enabled. Password content requirements increase the overall security of your AWS environment by enforcing the creation of strong user passwords. When you create or change a password policy, the change is enforced immediately for new users but does not require existing users to change their passwords.

Alert Criteria: A password policy is enabled, but at least one content requirement is not enabled.

Recommended Action: If some content requirements are not enabled, consider enabling them. If no password policy is enabled, create and configure one. See Setting an Account Password Policy for IAM Users.

For this recommendation, clearly answer these questions:

What is the status?

What is the problem?

What specific environment details are you given?

What is the best practice?

What is the recommended action?

Answer:

Answer:

Scenario 3:

Security Groups – Unrestricted Access

Description: Checks security groups for rules that allow unrestricted access to a resource. Unrestricted access increases opportunities for malicious activity (hacking, denial-of-service attacks, loss of data).

Alert Criteria: A security group rule has a source IP address with a /0 suffix for ports other than 25, 80, or 443.)

Recommended Action: Restrict access to only those IP addresses that require it. To restrict access to a specific IP address, set the suffix to /32 (for example, 192.0.2.10/32). Be sure to delete overly permissive rules after creating rules that are more restrictive.

For this recommendation, clearly answer these questions:

What is the status?

What is the problem?

What specific environment details are you given?

What is the best practice?

What is the recommended action?

Answer:

Answer:

Scenario 4:

Amazon EBS Snapshots

Description: Checks the age of the snapshots for your Amazon Elastic Block Store (Amazon EBS) volumes (available or in-use). Even though Amazon EBS volumes are replicated, failures can occur. Snapshots are persisted to Amazon Simple Storage Service (Amazon S3) for durable storage and point-in-time recovery.

Alert Criteria: 

Yellow: The most recent volume snapshot is between 7 and 30 days old.

Red: The most recent volume snapshot is more than 30 days old.

Red: The volume does not have a snapshot.

Recommended Action: Create weekly or monthly snapshots of your volumes.

For this recommendation, clearly answer these questions:

What is the status?

What is the problem?

What specific environment details are you given?

What is the best practice?

What is the recommended action?

Answer:

Answer:

Scenario 5:

Amazon S3 Bucket Logging

Description: Checks the logging configuration of Amazon Simple Storage Service (Amazon S3) buckets. When server access logging is enabled, detailed access logs are delivered hourly to a bucket that you choose. An access log record contains details about each request, such as the request type, the resources specified in the request, and the time and date the request was processed. By default, bucket logging is not enabled; you should enable logging if you want to perform security audits or learn more about users and usage patterns.

Alert Criteria: 

Yellow: The bucket does not have server access logging enabled.
Yellow: The target bucket permissions do not include the owner account. Trusted Advisor cannot check it.

Recommended Action:  Enable bucket logging for most buckets. 

If the target bucket permissions do not include the owner account and you want Trusted Advisor to check the logging status, add the owner account as a grantee.

For this recommendation, clearly answer these questions:

What is the status?

What is the problem?

What specific environment details are you given?

What is the best practice?

What is the recommended action?

Answer:

Answer:

CloudWatch Alarms

Background

Amazon CloudWatch is a monitoring and observability service that is built for DevOps engineers, developers, site reliability engineers (SRE), and IT managers. CloudWatch monitors your AWS resources (and the applications that you run on AWS) in real time. You can use CloudWatch to collect and track metrics, which are variables that you can measure for your resources and applications.

You can create an alarm to monitor any Amazon CloudWatch metric in your account and use the alarm to automatically send a notification to an Amazon Simple Notification Service (Amazon SNS) topic or perform an Amazon EC2 Auto Scaling or Amazon EC2 action. For example, you can create alarms on the CPU utilization of an EC2 instance, Elastic Load Balancing request latency, Amazon DynamoDB table throughput, Amazon Simple Queue Service (Amazon SQS) queue length, or even the charges on your AWS bill. You can also create an alarm on custom metrics that are specific to your custom applications or infrastructure.

You can also use Amazon CloudWatch Events to define rules that match incoming events (or changes in your AWS environment) and route them to targets for processing. Targets can include Amazon EC2 instances, AWS Lambda functions, Kinesis streams, Amazon ECS tasks, Step Functions state machines, Amazon SNS topics, Amazon SQS queues, and built-in targets. CloudWatch Events becomes aware of operational changes as they occur. CloudWatch Events responds to these operational changes and takes corrective action as necessary, by sending messages to respond to the environment, activating functions, making changes, and capturing state information.

For more information on creating CloudWatch alarms, see the topics under Using Alarms in the AWS Documentation. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html

Question 1:

A valid CloudWatch alarm for an Amazon EC2 instance is:

“If average CPU utilization is > 60% for 5 minutes…”

True

False

Answer:

Answer:

Question 2:

A valid CloudWatch alarm for an Amazon RDS database is:

“If the number of simultaneous connections is > 10 for 1 minute…”

True

False

Answer:

Answer:

Question 3:

A valid CloudWatch alarm for an Amazon S3 storage service is:

“If the maximum bucket size in bytes is around 3 for 1 day…”

True

False

Answer:

Answer:

Question 4:

A valid CloudWatch alarm for Elastic Load Balancing is:

“If the number of healthy hosts is < 5 for 10 minutes…"

True

False

Answer:

Answer:

Question 5:

A valid CloudWatch alarm for Amazon Elastic Block store is:

“If the volume of read operations is > 1,000 for 10 seconds…”

True

False

Answer:

Answer:

Elastic Load Balancing

Background

Modern high-traffic websites must serve hundreds of thousands—if not millions—of concurrent requests from users or clients, and then return the correct text, images, video, or application data in a fast and reliable manner. Additional servers are generally required to meet these high volumes.
 
Elastic Load Balancing (EBS) is an AWS service that distributes incoming application or network traffic across multiple targets—such as Amazon Elastic Compute Cloud (Amazon EC2) instances, containers, internet protocol (IP) addresses, and Lambda functions—in a single Availability Zone or across multiple Availability Zones. Elastic Load Balancing scales your load balancer as traffic to your application changes over time. It can automatically scale to most workloads.

Elastic Load Balancing is available in three types:

An Application Load Balancer operates at the application level (Open Systems Interconnection, or OSI, model layer 7). It routes traffic to targets—Amazon Elastic Compute Cloud (Amazon EC2) instances, containers, Internet Protocol (IP) addresses, and Lambda functions—based on the content of the request. It is ideal for advanced load balancing of Hypertext Transfer Protocol (HTTP) and Secure HTTP (HTTPS) traffic. An Application Load Balancer provides advanced request routing that is targeted at delivery of modern application architectures, including microservices and container-based applications. An Application Load Balancer simplifies and improves the security of your application by ensuring that the latest Secure Sockets Layer/Transport Layer Security (SSL/TLS) ciphers and protocols are used at all times.

A Network Load Balancer operates at the network transport level (OSI model layer 4), routing connections to targets—EC2 instances, microservices, and containers—based on IP protocol data. It works well for load balancing both Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) traffic. A Network Load Balancer is capable of handling millions of requests per second while maintaining ultra-low latencies. A Network Load Balancer is optimized to handle sudden and volatile network traffic patterns.  

A Classic Load Balancer provides basic load balancing across multiple EC2 instances, and it operates at both the application level and network transport level. A Classic Load Balancer supports the load balancing of applications that use HTTP, HTTPS, TCP, and SSL. The Classic Load Balancer is an older implementation.  When possible, AWS recommends that you use a dedicated Application Load Balancer or Network Load Balancer. To learn more about the differences between the three types of load balancers, see Product comparisons on the Elastic Load Balancing Features page – https://aws.amazon.com/elasticloadbalancing/features/?nc=sn&loc=2

Question 1:

You must support traffic to a containerized application.

Classic Load Balancer

Application Load Balancer

Network Load Balancer

Answer:

Answer:

Question 2:

You have extremely spiky and unpredictable TCP traffic.

Classic Load Balancer

Application Load Balancer

Network Load Balancer

Answer:

Answer:

Question 3:

You need simple load balancing with multiple protocols.

Classic Load Balancer

Application Load Balancer

Network Load Balancer

Answer:

Answer:

Question 4:

You need to support a static or Elastic IP address, or an IP target outside a VPC.

Classic Load Balancer

Application Load Balancer

Network Load Balancer

Answer:

Answer:

Question 5:

You need a load balancer that can handle millions of requests per second while maintaining low latencies.

Classic Load Balancer

Application Load Balancer

Network Load Balancer

Answer:

Answer:

Question 6:

You must support HTTPS requests.

Classic Load Balancer

Application Load Balancer

Network Load Balancer

Answer:

Answer: