☁︎SAA-C03

CloudWatch

CloudWatch — Concept

What it is

Amazon CloudWatch = AWS's native observability service: metrics, logs, alarms, dashboards, events (now mostly EventBridge), and synthetic monitoring.

Why it exists

Every production system needs centralized monitoring. CloudWatch is the default collection point for AWS resources and apps, and the trigger source for many automated responses.

Components

ComponentWhat it does
MetricsTime-series data per namespace/dimension. Standard metrics for most services every 1–5 min; detailed monitoring every 1 min (EC2 paid).
Custom metricsPush from apps via PutMetricData or the CloudWatch agent. High-resolution = 1-second granularity.
LogsLog groups → log streams → events. Subscribe to Lambda/Firehose/Kinesis. Retention configurable per group.
Logs InsightsQuery language for ad-hoc log analysis.
AlarmsTrigger on a metric threshold, ANOMALY_DETECTION, or composite (multiple alarms). Action: SNS, Auto Scaling, EC2 actions, SSM OpsItem.
DashboardsCustom panels of metrics/logs.
SyntheticsCanary scripts hit URLs to detect outages.
RUM (Real User Monitoring)Capture browser/JS perf data.
ServiceLens / Application InsightsCross-service troubleshooting.
Container InsightsMetrics/logs for ECS / EKS.
Contributor InsightsFind "noisy neighbors" in logs/metrics.

CloudWatch Agent

  • Push OS-level metrics (memory, disk usage — not in default EC2 metrics) and logs to CloudWatch.
  • Configure via SSM Parameter Store or local JSON.

EC2 metrics by default vs need-agent

Default (per-minute or 5-min)Needs Agent
CPUUtilizationMemory
NetworkIn/OutDisk usage (used %)
DiskRead/WriteOps/BytesCustom app metrics
StatusCheck (system / instance)Application logs

Alarms

  • 3 states: OK, ALARM, INSUFFICIENT_DATA.
  • Periods (1 s high-res, or 10/30/60+ s).
  • Evaluation periods × datapoints to alarm.
  • Targets: SNS, Auto Scaling action, EC2 stop/terminate/reboot, SSM OpsItem.
  • Composite alarms combine sub-alarms with AND/OR.

Logs

  • Push from: SDK, CW Agent, Lambda extension, FireLens (ECS), Fluent Bit, third-party.
  • Subscription filters → Kinesis / Firehose / Lambda for real-time processing.
  • Metric filters convert log patterns into custom metrics (e.g. count ERROR lines).
  • Encryption (KMS), retention 1 day–10 years (or never expire).

Logs Insights example

fields @timestamp, @message
| filter @message like /ERROR/
| stats count() by bin(5m)

EventBridge (formerly CloudWatch Events)

  • See EventBridge/ notes. CloudWatch Events still appears in older exam phrasing — treat as EventBridge.

When to use vs alternatives

NeedUse
OS-level + custom metrics from EC2CloudWatch Agent
Query logs ad-hocCloudWatch Logs Insights
Centralize logs from many accountsSubscription filter → Kinesis Firehose → S3 (or cross-account log destination)
Auto-scale on custom metricCloudWatch Alarm → ASG policy
Synthetic uptime checksCloudWatch Synthetics
Detailed app tracingAWS X-Ray (separate)
AWS API auditCloudTrail (separate)

Common exam scenarios

  1. "Monitor EC2 memory and disk usage" → install CloudWatch Agent (not in default metrics).
  2. "Trigger Lambda on a log pattern (ERROR)"Metric filter + alarm + SNS → Lambda, or Subscription filter directly to Lambda.
  3. "Auto-scale on application queue depth" → custom metric or ApproximateNumberOfMessagesVisible → alarm → ASG step policy.
  4. "Restart unhealthy EC2 automatically" → StatusCheckFailed alarm → EC2 action reboot/recover.
  5. "Quickly detect site outage from outside"Synthetics canaries.
  6. "Cross-account central logging" → log destination in security account; cross-account subscription filters.

Exam tip

  • EC2 default metrics don't include memory / disk-used % — agent required.
  • Alarms can act directly on EC2 / ASG without Lambda glue.
  • CloudWatch Logs ≠ CloudTrail — Logs is app/infra logs; CloudTrail is API audit.

References