ELB Monitoring and Troubleshooting

Error status codes¶

4xx is client side errors where as 5xx is server side errors.
HTTP 400 - Bad request
HTTP 401 - Unauthorised
HTTP 403 - Forbidden error: WAF
HTTP 500 - Internal server error: network connectivity.
HTTP 501 - Not implemented: header with and unsupported value.
HTTP 502 - Bad gateway: TCP connection was closed.
HTTP 503 - Service unavailable:
- Target groups have no registered targets
HTTP 504 - Gateway timeout:

Monitoring¶

All Load Balancer metrics are directly pushed to CloudWatch metrics.
Metrics sent:
- BackendConnectionErrors
- HealthyHostCount/UnHealthyHostCount
- HTTPCode Backend 2XX: Successful request
- HTTPCode_Backend_3XX, redirected request
- HTTPCode_ELB_4XX: Client error codes
- HTTPCode_ELB_5XX: Server, error codes generated by the load balancer.
- RequestCountPerTarget: Good metric to monitor and scale on
- SurgeQueueLength: The total number of requests (HTTP listener) or connections (TCP listener) that are pending routing to a healthy instance. Help to scale out ASG.
  - Max value is 1024
  - We don't want to have a big queue of requests
- SpilloverCount: The total number of requests that were rejected because the surge queue is full.
  - This is something we never want over 0, if it is over 0 then we need to scale our backend.

Troubleshooting¶

HTTP 400: BAD_REQUEST: The client sent a malformed request that does not meet HTTP specifications.
HTTP 503: Service Unavailable: Ensure that you have healthy instances in every Availability Zone that your load balancer is configured to respond in.
- Look for HealthyHostCount in CloudWatch
HTTP 504: Gateway Timeout: Check if keep-alive settings on your EC2 instances are enabled and make sure that the keep-alive timeout is greater than the idle timeout settings of load balancer.
Request tracing - Each HTTP request has an added custom header X-Amzn-Trace-Id
- Example: X-Amzn-Trace-Id: Root=1-67891233-abcdef012345678912345678
- This is useful in logs / distributed tracing platform to track a single request

A large IT company manages several projects on AWS Cloud and has decided to use AWS X-Ray to trace application workflows. The company uses a plethora of AWS services like API Gateway, Amazon EC2 instances, Amazon S3 storage service, Elastic Load Balancers and AWS Lambda functions. Which of the following should the company keep in mind while using AWS X-Ray for the AWS services they use?

Application Load balancers DO NOT send data to X-Ray¶

Elastic Load Balancing application load balancers add a trace ID to incoming HTTP requests in a header named X-Amzn-Trace-Id. Load balancers do not send data to X-Ray and do not appear as a node on your service map.

What is Keep Alive Setting in EC2 (Not Important for Exam)¶

In the context of Amazon Elastic Compute Cloud (EC2), keep-alive settings refer to a configuration option that controls the length of time that idle connections to an EC2 instance are kept open.
When a client establishes a connection with an EC2 instance, the server-side operating system typically keeps the connection open for a period of time to allow for additional requests to be sent over the same connection without having to re-establish the connection each time.
Keep-alive settings determine the length of time that idle connections are kept open, after which they are closed to free up system resources.
By default, Amazon Linux and most other Linux distributions use a keep-alive interval of 2 hours, meaning that connections that have been idle for more than 2 hours will be closed.
- This value can be adjusted by modifying the system's TCP keepalive settings.
Adjusting keep-alive settings can be useful in scenarios where EC2 instances are receiving a high volume of requests, as it can help to reduce the number of idle connections that are kept open and conserve system resources.
- However, it's important to be mindful of the potential impact on application performance, as setting keep-alive intervals too low can result in increased connection overhead and slower response times.

Logging¶

Elastic Load Balancing provides access logs that capture detailed information about requests sent to your load balancer.
Each log contains information such as the time the request was received, the client's IP address, latencies, request paths, and server responses. You can use these access logs to analyse traffic patterns and troubleshoot issues.
Access logging is an optional feature of Elastic Load Balancing that is disabled by default.
After you enable access logging for your load balancer, Elastic Load Balancing captures the logs and stores them in the Amazon S3 bucket that you specify as compressed files. You can disable access logging at any time.
We only pay for S3 storage.
Helpful for compliance reasons.

Last updated: 2023-03-11